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Preface 


These are notes for a six week summer course offered to graduate students at Bowling Green 
State University. The audience consisted primarily of first and second year students in pure 


mathematics, but also included a few students of applied mathematics and statistics. 


I had no course outline or plan of attack other than to begin with the most widely 
known classical inequalities and see where that led us. For the most part I simply followed 
my nose and steered the discussion toward whatever held my interest at the moment. I 
should quickly add, however, that the presentation consisted of more than lectures: From 
the outset, the students were expected to present (and discuss) their solutions to the 
problem sets in class, which added immensely to the experience. On several occasions we 
devoted more than half the class period to discussing the problems. This material would 
easily lend itself to a student seminar, a course on problem solving, or even a course taught 


using the so-called Moore method. 


I assumed very little by way of prerequisites and tried to make the course accessible to 
a wide audience, which may help to explain the selection of topics. There are a few topics 
that I would like to have covered (topics that required a knowledge of Lebesgue integration, 
for example, or rudimentary functional analysis) that I feared would be beyond the grasp 
of some of the students. Still, I think I managed to demonstrate a variety of techniques and 
methods of proof in this brief survey, although my personal preferences, which lean toward 
applications of convexity (and, more generally, toward an analytic rather than algebraic 


perspective) are plainly reflected in these notes. 


Preliminaries 


Obviously, the comparison of quantities is an essential tool in mathematics. In this course 
we'll be concerned with all manner of inequalities and, in particular, their systematic study 


and the tools used in their creation. 


The art of inequalities is found in the clever, often subtle methods used to generate 
and verify them. The science of inequalities lies in their careful interpretation and in the 


knowledge of their scope and limitations. 


We will take great pains to distinguish between strict inequalities (<) and weak in- 
equalities (<). In the case of weak inequalities, we will carefully examine the case for 
equality. While we will freely use techniques from calculus, we will be sparing with limits, 


lest strict inequalities turn into weak inequalities. 


Although the study of inequalities is arguably a subfield of real analysis, we will 
occasionally find recourse to tools from complex analysis, linear algebra, geometry, and, of 


course, algebra. Moreover, we will uncover applications to all of these fields (and more!). 
As a starting point, we’ll begin with two simple axioms: 
(1) A given real number a satisfies precisely one of the following: a < 0,a=0,a> 0. 
(2) Ifa>0 and b>0, then ab>0 anda+b>0. 


There are obvious variations and extensions of these axioms; for example: 


(1) Given a, b € R, precisely one of the following holds: a < b,a=b, a> b. 


and: 
Theorem. Ifa >b>0Oandc>d>0, then ac > bd. 


Proof. Given a > b, we have a—b > 0 and, hence, (a—b)c > 0. That is, ac > bc. Similarly, 


c—d> 0 leads to b(c — d) > 0 and, hence, bc > bd. Combining these two observations 


yields ac > bd. 


Please consult the exercises for more “direct algebra” proofs. 


As you can well imagine, we will also have frequent use of mathematical induction. 
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Theorem. If a> b> 0, then a” > b” for alln =1,2,3,.... 
Theorem. n! > 2"~! for n = 3,4,.... 


Bernoulli’s Inequality. Ifx > —1, then (1+2)”" >1+nz2 for alln = 1,2,3,.... Equality 


occurs only ifn = 1 or x = 0. 


Proof. Fix x > —1. The inequality clearly holds for n = 1, so assume that (1+2)” > 1+nz 
for some n > 1. Then: 
(ea) Oe) (Law) 
>(1+2)(1+nz) (because 1+ 2 > 0) (1) 
=1+(n+1l)x+2 


>14+(n+1)z. (2) 


Thus, the inequality holds for all n = 1,2,3,.... Note that equality in (2) forces x = 0. 


Refinement. If x > —1, x £0, then (1+2)" >1+n 2 for all n = 2,3,4,.... 


Corollary. Given x, y > 0, we have (x + y)” > a" +na"—1y for all n = 2,3,4,.... 


Equality can only occur if y = 0. 


1 n 
Application. (1 + = | increases with n. 
n 


Proof. It suffices to show that (1 - a) /a + 1)" > 1. For this we rewrite and apply 


Bernoulli’s inequality: 


n n+l 
(tab _ (42), (Leste) 
ara a) aed 
_ ara n?+2n\""? 
= n (n+ 1) 
1 ae wie 
eae es, se ges 
a) aay) 
1 1 : 
= (1+2)-(QQ-—4)=1 (by Bernoulli). 
n 


Example. n! < (=) for n > 6. 


Proof. It’s not hard to check directly that 6! < 3°. For the inductive step: 
mer yet n+1 n+t1\" 7n\” n+1 
= -2.n\= 1)! 
Gy eee ee a 


Bernoulli’s inequality is deceptively powerful. As evidence of this, we’ll use it to prove 


a classical inequality due to Cauchy. 


The Arithmetic-Geometric Mean Inequality. Given n € N and positive numbers 
@1,42,...,@, we have 


(Cir ie a A (AGM) 


with equality if and only if aj = ag =--- = Gyn. 


Proof. We'll proceed by induction, of course. While the inequality is plainly true for n = 1, 
it’s not hard to establish for n = 2 (see the exercises). So, suppose that the theorem holds 
for some fixed n and all choices of a), a2,...,@, > 0. Given an+1 > 0, let’s suppose (for 
simplicity of notation) that a,41; > ax for k = 1,...,n (otherwise, relabel the terms). 


Let’s also agree to write 


G, = (qaancap ye” and A; = Ch ET ET 


k 
Then 
nAy + Qn41 Aan+1 — An 
Aceh eee Eh ge eae 
a n+1 n+1 


where, by assumption, @,4; > A,. Thus, by our Corollary to Bernoulli’s inequality, 


AM > Ata At (Se) (3) 
= An41A;, 
> anyiGh = Gril. (4) 


Note that equality in (3) would force a,41 = Ay while equality in (4) would force ay = --- = 


an (by hypothesis). Hence, equality throughout would yield ay = ag = +--+: = Gn41. 
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The AGM is extremely important—and well worth examining in greater detail— 


including variations, extensions, and a number of alternate proofs. 


Example. Amongst all rectangles having a fixed perimeter P, the square has maximum 


area. 


Proof. Given a rectangle with sides of length a and b we have 


VA=Vab < a - - (a constant). 


Thus, A maximum when A = (P/4)? which occurs when a = b = P/4. 


This is a simplified example of what is sometimes called Dido’s problem or the isoperi- 
metric problem: Amongst all planar regions having a fixed perimeter, which one has max- 


imum area? The answer (which we may have time to pursue later) is the circle. 
Example. Find the dimensions of the most economical 12 ounce soda can. 


Solution. This is a familiar problem from calculus: We want to find the right-circular 


cylinder having fixed volume V = rr7h and minimum surface area S = 2nr? + 2rrh. But 


V 
S =2rr?+2nr- oi 
Tr 


2V 
= 2nr? + — 
" 
‘ 
r 
1/3 
> 3 (201? : - . =| (6) 
1/3 
- 3(2nv) 


V 
earn 


where rewriting in (5) facilitates the application of the AGM in (6). Now equality occurs 


when 2rr? = V/r; i.e., when r = (V/2m)'/3. For this value of r we have 


on \ 2/3 1/3 
pe eae), ee 
rr T V Qr 


In other words, the can should be as tall as it is wide. Not very realistic, but easy to solve. 
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Cauchy’s original proof of the AGM used a technique called backward induction: Sup- 


pose that P(n) is a proposition about the natural number n and suppose that 
— P(n) holds for infinitely many n, and 
— P(n—1) holds whenever P(n) holds. 

Then P(n) holds for all n. 


Cauchy’s proof of the AGM. We first establish (AGM) for n = 2”, m = 1,2,.... To begin, 


ae a, + a2 a, — ag - a, + a2 2 
cae 2 2 2 


unless a; = a2. The inductive step will be clear once we progress from m = 1 to m = 2. 


note that 


2 2 
ajaza3a4 < eres eve’ (7) 
2 2 
aytag aztag \ 2°2 
<(— SS) (8) 


_ oe 
- 4 


Equality in (7) forces a; = ag and a3 = a4, while equality in (8) forces aj + a2 = a3 + a4; 


thus, we have strict inequality in either (7) or (8) unless a; = ag = a3 = a4. Continuing 


leads to 
=) 
a,aQ°*:Agm < 

2m 
unless aj = --- = dgm. Now to see how the case of general n follows, consider this: Given 
te 2” and G4 eng ty, 0} set Or S Gi sin, Dg SG Oper SS 2 = bo SA Phen 

diet At = By seb 
by +--+ + bom ae 
9 
a) (9) 
m 2™ 
am 

Hence, a1---@, < A”. Note that equality in (9) would force a, =--- = dn. 


As a corollary, we get a special case of Young’s inequality, which we will see again 


very soon. 


mz+(n-—m)y 
S ; 


Corollary. Let x, y > 0 and let 1<m<n. Then a™/"y'-™" < 


That is, if r is a rational number satisfying 0 <r <1, then x"y*~" <ra+(1—r)y. 


We conclude this section with a few stray facts from algebra and a few tools from 


calculus that will prove helpful in the sequel. 


The Binomial Theorem. (1+ 2)” = S- (;,)2* 


1 n n 
Application. (1 + =] a > form = ly 2os 4: 
n 
k=0 


Proof. (+2 +" =14y% ui) ee <q oa 


1 1 n+1 
As it happens, y —<{1l+-— , but this is harder to prove. 
ki! n 
k=0 
beagert 


Geometric Series. For r 4 1 we have 1+r+r?+---+r" = a rE Thus, for 
-—r 


= 1 
\r| < 1, we have y r* = — (See the exercises for a proof that r” — 0 for |r| < 1.) 
—r 
k=0 


Application. ye — <3 for alln =1,2.,.... 
k= oo 


Proof. As we’ve seen, k! > 2*—! for k > 2. Hence, 
ae, Pe aa 
k=0 k=2 


1414+ > a 


< 
k=2 
n—-1 i k 
= 1 = 
+> (5) 
k=0 
1 
< 14+ —— = 3. 


1— (1/2) 


The Mean Value Theorem. If f : J — R is differentiable, then for x, y € I, we have 


for some c between x and y. 


Application. For  >—-—1andr>1, we have (1+ 2)" >1+ra with equality only at 
2= 0, 
Proof. If f(z) =(1+ 2)" for x > —1, then 

(ay S1 = fe)=f(0) =]+70 4c)" 12 


for some c between x and 0. If > c > 0, then 1+c > 1 and, hence, (1+2)"—-—1> rz. If 


—-l<a<c<0,then0<1+c< 1 and, again, (1+2)"-1>rz. 


1 x 1 etl 
Application. (1 + | increases while (1 + | decreases for x > 0. 
x ey 


Proof. For x > 0, the mean value theorem assures us that log(a# + 1) — logx = 1/c for 


some x <c<a2+1. Thus, 


 { eflog(e + 1) —loga]} = log(z + 1) — logz — = 


dr x+1 
cae 
 ¢@ «+1 
and it follows that (1 ote 1)" = etlles(e+)—log ©] ig increasing. Similarly, 
d 1 
atG + 1)[log(# +1) - loga|} = log(x + 1) — logax — ; 
1 1 
= Se OU 
Ge 


which means that (1 + a = e(et1)[log(z+1)—log =] is decreasing. 
Here’s a final application (whose proof is left as an exercise). 
Application. If x 4 y are positive, then 


"lig —y) >a" —y" > ry’ (2—y) forr<O0orr>1; 


(i) rx 
(ii) ra’ "(a2 —y) <a” -y" <ry"\(x@—-y) for0<r<l. 


To see how this is related to our earlier work, watch closely: 


a —y" <ry” a@—y) (0<r<1) 
— gery l(e-yt+y’ 


= gy l<r(e—-yty=ret(l—r)y. 


Look familiar? 
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11. 


12. 


13. 


14. 


Problem Set 1 


. Ifa>b>O and if p and qg are positive integers, prove that a?/? > b?/4. In particular, 


given a, b > 0, conclude that a > b if and only if a?/¢ > bP/4, 


If a > b > 0, p is a nonnegative integer, and q is a positive integer, show that 


a’/4 > bP/4, with equality occurring if and only if either (i) a = b or (ii) p= 0. 


If ay > 6b) > 0, ag > bg > O,..., Gn > bn > 0, show that ayjag---ayn > bybo--- by. 


Moreover, equality holds if and only if ay = b1, a2 = bo, ..., dn = bn- 


If m and n are positive integers, show that /2 lies between m/n and (m+2n)/(m-+n). 
[Hint: Either m/n < WD ora? m/n.| 


From the inequality (a — b)? > 0, deduce that 2ab < a? + b?, with equality occurring 
if and only ifa=b. 


Given a, b > 0, show that Vab < (a +b) /2, with equality if and only if a = b. 
Amongst all rectangles having a fixed area A, prove that the square (with sides of 


length VA) has the smallest perimeter. 


iy -. He? 
From the inequality (= ——=] = 0, deduce that 
va vb 


2 
(ayy SY 


for all positive a, b. Under what circumstances does equality hold? 
Show that a+ (1/a) > 2 for all positive values of a. When does equality occur? 


If0 <c < 1, use Bernoulli’s inequality to show that c” — 0. [Hint: Write c = 1/(1+2) 
where x > 0.] 


If c > 0, use Bernoulli’s inequality to show that c!/” > 1. (Hint: If c > 1, write 


cl/" —1+42,, where x,, > 0, and estimate es 


Given x > 0, use Bernoulli’s inequality to show that (1+ (a#/n))” increases while 


(1+ (a/n))"*" decreases as n increases. 


Given a > 0, show that (1+a)" > 1+ ra for any rational r > 1. [Hint: Write r = p/q, 
where p > q are positive integers, and note that (1+ (a/p))? > (1+ (a#/q))? for x > 0.] 


Prove by induction that (5) x l < (5) for n > 6. 
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15. 


16. 


17. 


18. 


n 


1 
Use the fact that k! > 2*—!, for all k = 1,2,3,..., to prove that kl < 3 for all 
k=0 ~ 


Nes te 
Suppose that 0 < 2; <1 fori =1,2,.... Prove that 

ewe (LS a,) Tio ee ay) io ee, i) 
forall fi 2S, 


For n = 1,2,3,..., show that 2(/n+1—/n) 
this to prove that, for n = 2,3,..., 


< = < 2(./n —V/n—1) and use 


=a. 
2/n-2 < — < 2V/n-1. 
Vn Fi vn 


Prove that 
1 Ee 1 38 2n—1 e 1 
V4n+1 2 4 2n V3nt+1 
for. @ = 2, By 4s 


Convex Functi 


ons 


A function f : I — 


R, defined on a nontrivial interval J, is said to be convex if 


f(Ar + wy) <Af(x) + uf (y) (1) 


whenever z, y € J, A, w> 0, A+ = 1. If the inequality in (1) is always strict for x F y, A, 


pt > 0, we say that f is strictly conver. We say that f is concave (resp., strictly concave) 


if —f is convex (resp., strictly convex). 


Examples. 


(1) f(x) = ax + b is convex—and concave, too, of course. 


(2) f(x) = |a| is convex because |Ar + wy| < Ala] + uly] for A, uw > 0. 


(3) f(x) = 2? is strictly convex. 


Proof. Because \ + 4 = 1 we have 


provided A, wy > 0 and x F y. 


As it happens, 


da? + py? — (Ax + py)? = Apa? + pry? — Qrpay 


= Au(a — y)? > 0, 


the rather unassuming inequality in (1) actually implies that convex 


functions are quite well behaved. For example, it’s not terribly hard to prove the following: 


R be convex, where I is an open interval. Then 


Fact. Let f:I—- 


(i) f is continuous on J. 


(ii) f is differentia 


ble at all but (at most) countably many points of I. 


(iii) f’ is continuous wherever it exists. 


An equivalent 


characterization of convexity is provided by the “three chords lemma.” 
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Theorem. Let f :/—R. Then f is convex if and only if 


for Albee ee yin 


Proof. First suppose that f is convex. Given x < z < y, we can write 


yY-z eh 
z= “B+ “Yy 
yY-a@ Y-a@ 


(where the coefficients of « and y are nonnegative and add to 1). Thus, 
y—2 2-2 
YX Yaz 


f(z) < 


The inequalities in (2) now follow by rewriting; for example, 


4@), => EOS 2 | 


y—x Z-2 y-—x 


omen 


flz)- f(a) < 
which is the first half of (1). The other half is entirely similar. 


Now suppose that (2) holds. Take x < yin J, A, w>0,A+w=1. Then z = Ax + py 


satisfies 
z-x=p(y—x) and y—z=Xy-2). 
That is, z = meee + = -y. Thus, from (2), 
YX YX 
fe) < fe) + —— -(f)- F@) 

= f(x) +u(f(y) — f(x) 

= Af(a) + uf (y). 
Corollary. If f : (a,b) — R is differentiable and convex, then 

fla) < ae < f'(y) 


for all x < y in (a,b). In particular, f' is increasing. 


The converse of this result is also true and, indeed, well worth expanding on. 


at 


Theorem. Let f : (a,b) — R be differentiable. Then the following are equivalent: 
(i) f is convex; 

(ii) f’ is increasing; 

(iii) f(x) + f'(a)\(y— 2x) < f(y) for all x < y in (a,)). 


Note that the left-hand side of (iii) is the tangent line to the graph of f at x. Thus, 
(iii) tells us that every tangent line to the graph lies below the graph. A similar result 


holds for nondifferentiable convex functions (see the exercises for details). 


Proof. (i) implies (ii) is clear. (ii) implies (iii) follows from the mean value theorem. 
Indeed, 
fy) =f) +f'()w—2), wherex<c<y, 


> f(e) + f'(z)\(y-2). 
Finally, (iii) implies (i). Given x < z < y we have: 
f(a) + f"(w)(2- 2x) < f(z), 
fle) + f'(@)(@—-2) < fla), 
fla) + fy -2) < fy). 
Thus, 
Fly) — f(2) 


zZ-2 yY-2 
As in the proof of the three chords lemma, writing z = Ax + py leads to Af (z) _ f(z)) = 


u(f(y) — f(2)) or f(z) < Af(w) + uf (y). 


Claim. If f : (a,b) — R is twice-differentiable, then f is convex if and only if f” > 0 


(that is, f is “concave up”). If f” > 0, then f is strictly convex. (See the exercises. ) 


Examples. 


(1) x? is convex on (0,00) for p > 1, and strictly convex for p > 1. 


(2) e® is strictly convex on R. 


(3) log a is strictly concave on (0, 00). 


(4) xlog a is strictly convex on (0, 00). 
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Applications to Classical Inequalities 


We begin with an easy extension of our characterization of convex functions. 


Jensen’s Inequality. f : J — R is convex if and only if 
f(A 41 seat Agln) = Aif (x1) Siete Ant (Zn) 


whenever %1,...,%n € I, A4,.--,An > 0, Ay He HAn = 1. 


Jensen’s characterization follows easily from ours by induction. 


Armed with our knowledge of convex functions, we’re ready for a fresh attack on 
certain classical inequalities. We begin with a generalization of the AGM (but please 


compare this result with Young’s inequality). 


Theorem. Let x1,...,2,%, Q1,...,Q@% >0 with ay +---+az,=1. Then 
Bp ey7 ++ aE < ae + Ota +++: + OneE 
with equality if and only if x1 =--+- = Xx. 


Proof. Because log x is strictly concave on (0, 00) we have 
log (a? 29? - a) = Qa, logr, + ag logrg +---+ az log ry 


= log(aix1 + ag%g +--+ + Qk); 


with equality if and only if 7, = --- = xz. And, because log x (or e*”, if you prefer) is 


strictly increasing, the result follows. 


The case a, nee Ak 1/k is the familiar AGM. Another special case is also 


familiar and will lead us to two new inequalities. 


1 1 
Young’s Inequality. If p,q > 1 satisfy +++4=1, then xy < —a? +—y! for all x, 
p'4q Dp q 


y > 0, with equality if and only if x? = y?. 


This version of Young’s inequality leads to a simple proof of: 
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Holder’s Inequality. Let 71,...,%, Y1,---;Yn > 0 and let p, q > 1 satisfy + ; =1, 
Then 


n n 1/p n 1/q 
Sam < (Sot) (Seat) a) 
i=1 i=1 i=1 
with equality if and only if (x?) and (y?) are proportional; that is, if and only if, for some 


a, 3, not both zero, we have ax? = By? for alli =1,...,n. The case p = q = 2 is usually 


referred to as the Cauchy-Schwarz inequality. 


Proof. We may assume that u = (doy, ah) '/ UV anda= (os yf)/4 # 0, in which 


case we have, from Young’s inequality, that 


Ti Vi 1 ee 1 ey (4) 


uUioov p\wu q\v 


for each i, with equality if and only if (a;/u)? = (y;/v)?. Summing over i, we get 


n n n 


1 ul iL 
a TYi S puP y i qua > Y; 


Thus, >>), iyi < wv. Please note that equality in (3) would force equality in (4) for all 


i and, hence, (a;/w)? = (y;/v)? for all 7; that is, (7?) and (y7) would be proportional. 


Corollary. Let 71,...,%n, Y1,---,Yn > 0, let 0 < p < 1, and let q satisfy ae = 1. Then 


a PA l/p 7 n 1/q 
ae (st) (329) 
i=l i=1 i=1 


with equality if and only if (x? 


?) and (y7) are proportional. 


Proof. This version actually follows from our previous version after a bit of judicious 


rewriting: 


n 


at = Ga. (5) 


w=1 


Now we apply Holder’s inequality using the conjugate pair of exponents 
1 

p =->1 and ee eit 
Pp Pp 
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Note that 
1 1 


1 
+ ;=p+(-£)=p(1-2)=1 
yoo q P 


Applying Holder’s inequality with this pair of exponents yields 


n n P n —p/4q 
ye co <= (Sear) (> fe?) 
i=l i=l 


i=l 


() Ea)” 


Taking p-th roots and rearranging the terms finishes the proof. 


/\ 


We will see a few more variations and extensions of Hodlder’s inequality. For now we'll 


settle for the following: 


Theorem. Let a = (a;), b = (b;),...,h = (h;) denote elements of R" in which all entries 


are positive, let a,3,...,9 >0 witha+@+---+60=1. Then 


i=l w=1 w=1 


with equality if and only if a,b,...,h are all proportional. 


The proof is essentially identical to the proof of Hélder’s inequality for two sets of 


numbers: We apply Young’s inequality (or the AGM) to the expression 


(#3) (ss) ~ (a) 


and sum over 7. 
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10. 


11. 


Problem Set 2 


. If f : 1 > R is convex, show that f(A1v1 +--+ +Anve) < Ar f(a1) +--+ HARA (rE), for 


any %1,...,¢, € I and any \4,...,A, > 0 with A; +--- +A, =1. [This is sometimes 
called Jensen’s inequality, and follows easily by induction on k.] If f is strictly convex, 


prove that equality occurs in Jensen’s inequality (if and) only if all the x; are equal. 


Let f, 9: I — R be convex functions and let a € R. Determine which of the following 


are convex functions: af, f +g, f—4g, |fl|, fg, max{f,g}, min{f, g}. 


Let f, g: R — R be convex functions with g increasing. Show that go f is convex. 


Deduce that if h : R — R is a positive function and if log h is convex, then h is convex. 


Give an example of a positive convex function on R whose logarithm is not convex. 


Show that the reciprocal of a positive concave function is convex. Is the reciprocal of 


a positive convex function always concave? 


Prove that f : (0,co) — R is convex if and only if g(x) = xf (1/2) is convex on (0, 00). 


If f is convex on (0,00), and if 71,...,2%2m,Y1,---;Ym > 0, show that 
f+eeeH yy Pes 
(e144) f (BE) 2 ae @ eee ee (t). 
Lyte +lm Ly Lm 


Show that f(x) = (1+.?)!/P is convex on (0,00) for p > 1, and deduce from the first 
part of the exercise that [ (#1 +---+ am)? + (y +++: + ym)? ]'/” < (2h + y?)/P + 
aR, + oe). 

Let f : (a,b) — R be differentiable and convex. If f’(29) = 0, show that f(xq) is a 


global minimum for f. 


Let f : (a,b) > R be twice differentiable Show that f is convex if and only if f” > 0. 


If f” > 0, show that f is strictly convex. Give an example of a twice differentiable, 


strictly convex function for which f” fails to be strictly positive. 


If f, g: I — R are nonnegative, increasing, and convex, then so is fg. {Hint: First 
note that [f(x) — f(y)] [g(y) — 9(@)] < 0 for z<y,] 


Let f : (a,b) — R be continuous. Show that f is convex if and only if, for all 
a<s<t<b,wehave (t—s)' f° f(x) dz < [f(t) + f(s)]/2. 


Let f : J — R be convex. Then f has left- and right-derivatives at each point a in 


the interior of J and, moreover, f’ (a) < f(a). If a < b are points in the interior of 
I, show that 


fila) < 


12. 


13. 


14. 
15. 
16. 


17. 


If f:I—- 


compact subinterval of J. In particular, f is continuous on I. 


R is convex, then f is locally Lipschitz on I; that is, f is Lipschitz on every 


We say that T(x) = maz+b is a supporting line to the graph of f at xo if T(ao0) = f (xo) 


and T(x) < f(x) for all x. Prove that f : (a,b) > 


R is convex if and only if the graph 


of f has a supporting line at each xo € (a,b). [Hint: For the forward implication, take 
m with f! (#0) <m< fi (xo).] 


Prove that a convex function defined on a bounded interval is bounded below. 


Tk fs 


R= 


R is convex and bounded above, prove that f is constant. 


Given f : [ —R, the epigraph of f is the set epif = {(z,y): 2 EI, y> f(x)} in 


Show that f is convex if and only if epif is a convex subset of R?. 


We say that f : I — 


{x € I: f(x) < a} is closed. Prove that a convex function f : I > 


R?. 


R is lower semicontinuous (l.s.c.) if, for each real a, the set 


R is lower 


semicontinuous if and only if epif is closed. For this reason, |.s.c convex functions are 


often referred to as closed convex functions. 
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Problem Set 2, Problem 10 


10. Let f : (a,b) — R be continuous. Show that f is convex if and only if, for all 
a<s<t<b, we have (t—s)~ ‘f° F(z) \dx < [f(t) + f(s)]/2. 


Solution. (=>) If f is convex and if a < s <t < b, then, on the interval | s,t], the graph 
of f lies below the chord joining (s, f(s)) and (t, f(t)); that is, 
f(t) — fils 
fe) < #60) + (A) 


t—s 


)e@-9, s<act. 


Integrating both sides over | s,t| yields: 


[reraes sone 043 (AMO) 9 = (AOFM) 0 


t—s 2 


(<=) To prove the converse, it suffices to show that if f is not convex, then, on some 
subinterval |s,t] C (a,b), the graph of f lies (strictly) above the chord joining (s, f(s)) 
and (t, f(t)). That is, 


f(x) > f(s) + (A) (c—s), s<a<t, 


t—s 


for then we just integrate both sides, as before, concluding that (t — s) oh bs f@)da > 
f(t) + f(s)]/2. 


Now if f is not convex, then, for some pair of points a < s <t < b and some point 
Poe (==) 5+ (==): 
t—s t—s 
we have 


re) > (E*) #0) + (FZ) 10 = 10) + (FE) 9) = 200, 


t—s t—s t—s 


As suggested by Mr. Ghosh in class, we next consider: 
Sp=S sips oss <2 -fe je )) and tp =m ees <4, fe) = Le). 


Each of the sets above is closed and bounded, so sg and to exist and satisfy so < x < to 
and f(s9) = L(so), f(to) = L(to). (The continuity of both f and L has come into play 
here.) Moreover, f(y) > L(y) for all so < y < to. (We can’t have f(y) = L(y), and the 
intermediate value theorem tells us that we can’t have f(y) < L(y).) All that remains 
is to note that the line through (s9, f(s0)) and (to, f(to)) coincides with the line through 
(s, f(s)) and (t, f(t)); that is, the chord joining (59, f(so)) and (to, f(to)) lies on the chord 
joining (s, f(s)) and (t, f(t)). (See the figure on the back.) 
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S xt t 


What makes this problem especially interesting is that there’s a companion result (see 
Theorem 125 in Hardy, Littlewood, and Polya). 


10’. Let f : (a,b) — R be continuous. Then f is convex if and only if, for alla<s<t<b, 
we have (t—s)' i f(x) dx > f((s+t)/2). 


The forward implication is easy: If f is convex and if a < s < t < b, we can consider 


a supporting line at (s + t)/2; that is, we can find an m such that 


jee) 1 (£2) +m(o-(244)) 


for all x. Integration over | s,t] then yields: 


[ sean > o- 9 (FS) 


[e-Ce)ens 


I’m not sure how to prove the backward implication. Anyone interested? 


because 
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Elementary Means 


Before we pursue more inequalities, let’s introduce some notation. Given a vector x = (2;) 


in R” with positive entries and a real number p ¥ 0, we write 


n 1/p vi 
Spe) = (st) and M, (2) = € yt] 


The expression 5,,(x) is sometimes written as ||z||,. The expression M,(x) is called the 


1/p 


(simple) mean of of x of order p. Of course, we’re somewhat familiar with such expression 


when p > 0, but we will have occasion to explore the full range of values—even p = +co! 


Given a set of positive weights a1,...,Q@, > 0 with a,j +---+a, = 1, we write 


n 1/p 
M, (x, a) = (>: ot) 

i=1 

M,(x, a) is called the weighted mean of x of order p. 
Please note that all of these expression are positive homogeneous; for example, 
M,(kz,a) =kM,(z,a) for k>0, 

where kx = (kx;). They’re also increasing functions of x in the sense that M,(x,a) < 
M,(y,@) whenever x; < y; for all i. 


My, (2) is called the arithmetic mean of x; M2(x) is sometimes called the root-mean- 
square of x; M_,(2) is called the harmonic mean of x. One of our goals in this section is 


to deduce suitable expressions for M.,, M_., and Mo. First, a few simple 


Observations. 
1. min{21,...,%, } < M,(#,a) < max{21,...,2, }. Equality occurs (in either inequal- 
ity) only if #1 =---= Zp. 


This is reasonably clear if p > 0. If p < 0, simplest might be to appeal to the fact that 


1 


M_,(#, a) = M, (1/2, a) 


where 1/xz = (1/2;). 
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2. Mo(x,a) = limy.9 M,(x,a) = afef?---xe-. Thus, in particular, we also have 
WN ict PS Mga ee) Se Pasay be 


Proof. First write 


1 n 
M,(2, a) = exp 13 log (>: ot) . 
i=l 


Next, we appeal to |’Hopital’s rule to find 
Se an. lenny 


n _.P 
og, W8(EE at?) 
im = lim = 
p—0 p p70 1 OTF 


Ss a; log x; 
i=1 


=o a1 .,a2 a 
= log(a' X5 Re) 


3: Mae) = lig gM (eo). nek ei ey id 
Meese) = lis 8 5 Vin tr a)) Sin ahs as a. 
Proof. Suppose that x, = max{%1,...,%,}. Then ay! ap < M,(a,a) < x, and 


ay! P _; 1 as p — +00. The other case is similar but, again, we could just appeal to the 


fact that M_»(z,a) = [M,(1/z,a)]7?. 
4. For s < t, we have M,(x,a) < M,(x,qa) with equality only if 7) =--- = Zp. 


Proof. First consider the case s = 1. Given t > 1, let t’ be the conjugate exponent: 


1 1 - Se ord : 
+ + =1. Then, from Holder’s inequality, 


Yaw: = > al/'aal < (>: vat) 

i=1 i=1 i=1 
because )*""_, a; = 1. That is, Mi(x,a) < Mi(x,a). Equality can only occur if (a;2x}) is 
proportional to (a;); i-e., if and only if (x‘) is proportional to (1,1,...); i-e., if and only if 


x is constant. 


Now consider the case 0 < s < t. In this case, t/s > 1 and, hence, 


n 1/s i (s/t)(1/s) ri 1/t 
(>: ot) < (>: ori) = (>: on . 
i=1 i=l 1 
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The case s = 0 is very similar. Given t > 0, 
t n 
(wprnge sae)! = (ah) (0h) + (et) < Vase! 
i=l 
by the AGM. That is, Mo(x,a) < M;(a, a). 


The remaining cases follow easily from what we’ve already shown. For example, 


Mo(2,a) < M(x, a), for all x, implies that Mo(1/z,a) < Mi(1/z, a); that is, 


n n —1 
—a1,,—-a2 —a wad Q1.,Q2 Qa etd | 
De a Oe, y Aix; => 0, hp Pe Ss y Q4L, : 
i=1 


In other words, Mo(x,a) > M_1(az,a). 


Of particular interest are these: M_,, < M_, < Mo < M, < Mo < Ma. 


Minkowski’s Inequality. Let x, y, a € R” with positive entries and with ay+--:+a, = 


1. Then 
(i) M,(x +Y; a) = M, (2, a) + M,(y, @) for 1 < Dp < OO; 


(ii) M,(x+y,a) > M,(2£,a)+M,(y,a) for —co<p<1. 
If p #1 is finite, then equality can only occur if x and y are proportional. 


Proof. Of course, equality always occurs (in both (i) and (ii)) when p = 1, so we may 


suppose that p # 1. 


First suppose that 1 < p < oo and let p’ satisfy 1/p+1/p’ = 1. Then (p—1)p’ = p. 
Next, 


Moety,.a) =) edatG)? =) amtwetuy 


i=l i=l 
=> omi(ei ty)? + >) oy (es + ys)? 
i=l i=l 


IA 


"1 1/p 8 1/p ” 1/p" 
(>: ot) za (>: out (>: aus (xi + nom ) (1) 
=i i=1 


= | M,(2,a) + M,(y,a) -M,(2 + y,a)?-2. 
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Thus, M,(“ + y,a) < M,(a#,a) + M,(y,a). Equality occurs only if equality occurs in 
both applications of Hélder’s inequality (in (1)); that is, only if (a?) is proportional to 
((x; + y:)”) is proportional to (y?); all of which translates to: x is proportional to y. 

For 0 < p< 1, we know that Hélder’s inequality reverses, so (nearly) the same proof 
yields M,(x + y,a) > M,(x,a) + M,(y,a). The case —oo < p < 0 follow for precisely the 
same reason for, in this case, the conjugate exponent p’ satisfies 0 < p’ < 1 (and Hélder’s 


inequality reverses in this case, too). 


The three remaining cases are relatively easy to handle. We’ve actually shown the 


case p = 0, but using different notation: 


Mo(a, @) + Mo(y, a) a ig pega he yy ge 


Mo(tt+ya) (a1 ty) +++ (Gn + yn) 
Se ok te UN gee 
tity In + Yn Ty+ yi In + Yn 
x rn n 
< ay 1 + +++ + A), ———— + ay Y1 ee eee ae 
1+ 41 In t+ Yn i+ Y1 In t+ Yn 
= |! 


Equality can only occur if each of the sequences (x;/(a; + yi)) and (y;/(xi + y:)) are 


constant, from which you’ll quickly deduce that x and y must be proportional. 


Minkowski’s inequality also holds in the cases p = +oo, but the case for equality is a 


bit different. For example, 
M(a4+ y) = max{z, + y1,---,2n + yn} < Moo(x) + Mao(y). 


Equality can only occur if x and y attain their maximum values at the same coordinate; 


i.e., if and only if, for some k, we have x, = M(x) and yz = Ma(y). 


The Passage to Infinite Series 


It’s natural to ask whether our work on finite sums (and elementary means) extends to 


infinite sums. For the most part, everything we’ve done has a satisfactory analogue in the 
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infinite series case, however the proofs can be surprisingly difficult—simply passing to a 


limit is often not enough. 


For now, we'll settle for stating a few of these extensions (with the occasional proof). 
As we'll see, there are more convenient paths to all of this (and more) through integrals. 
Given a sequence of positive weights a = (a;) satisfying )>>*, a; = 1 and a positive 


number 0 < p < co, we define 


oo 1/p 
M,(2, a) = (>: ost) 
i=1 


for x = (a;) nonnegative. We’ll forego the case p < 0, but we will consider 


Mo(2, a) = en = exp (> Oj en 
i=1 i=1 


and 


M2e) =supse= bubs ati Sl 
i>1 


Facts. 
1. If M, is finite for some 0 < s < oo, then M,. is finite for all 0 < r < s and, in fact, 
M,. < Mg, with equality if and only if x is constant. Moreover, Mp is finite (or zero) 
in this case (i.e., Mo does not diverge to +00) and, in fact, Mo = lim,_,9+ M,. 


2. Mo < M, (AGM) with equality if and only if x is constant. 


Proof. From the inequality log x < « — 1 we have log x, — log M, < (a,/M,) — 1. Thus, 


so (log.2: — log M1 ) < ao (# — i) (2) 


or, log My — log M, < 1—1=0. Equality here would force equality in (2), which would 


mean that x, = M; for all k. 


3. If x = (zx) is bounded, then lim;4. Mr = Mo. 


Virtually all of our main results hold in the infinite series case with, at worst, minor 


modifications. 
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The Passage to Integrals 


In what follows, we will consider nonnegative, finite (almost everywhere) functions f : 


I — R, all defined on some common interval (or measurable set) J, and all assumed 


to be (Riemann or Lebesgue) integrable—meaning that |, f(x) da exists as a finite real 
number. In situations where the particular interval has no bearing on the argument or, 
more commonly, is the same for all integrals, we may suppress it (and x) and simply write 


Sf. 


We will only pursue a couple of (very familiar) inequalities. 


Holder’s Inequality. Let f, g : I — R be nonnegative integrable functions and let 4, 
u>0 with \+pu=1. Then f*g" is integrable and satisfies 


fre<(f) (fy 


Equality can only occur if, for some constants A and B, not both zero, we have Af = Bg. 


Proof. The proof is quite familiar by now: We may suppose that u = { f 4= and 


v= | g £0, in which case we have 


(ey (2" < 104 pale, 


U Vv _ 


with equality (everywhere) if and only if f/u = g/v. Now we integrate both sides: 


1 a 
wo f Pos fr+8 fg = 1. 
U~U U U 


A particular case is worth repeating: 


Corollary. Let 1 < p < o and let 5 + i = 1. If f? and g! are integrable, then fg is 


for (fe) ()" 


Equality can only occur if Af? = Bg? for some constants A and B, not both zero. 


integrable and satisfies 


As you can imagine, if p < 1, p 4 0, and if fg? is nonzero and finite, then the 


inequality reverses. We will forego the details. 


Minkowski is next, but a slightly more general version that will come in handy later. 
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Minkowski’s Inequality. Let 1 < p < oo. If |f|? and |g|? are integrable, then so is 


(fir+ a?) aoe (fur) ae (fw) (3) 


If p =1, equality can only occur if fg > 0. Ifp > 1, equality can only occur if fg > 0 and 


lf + g|? and 


Af = Bg for some nonnegative constants A and B, not both zero. 


Proof. If p=1, then 


irtais f(s tal) = fire fia. (a 


hence |f + g| is integrable. Equality in (4) forces |f +g] = |f| + ]|g| (everywhere) and, 
hence, we must have fg > 0 (that is, f(a) and g(a) must have the same sign for all x). 
If p > 1, first note that 
iftoP<[lfl+lol] 

< | 2max{|fl ll }]° 

= 2? max{|fl?, |gl? } 

< 2°(IfPP + Ig). 
Thus, |f + g|? will be integrable. Now let q = p/(p— 1) and apply Hoélder’s inequality: 


fipeor = firsts Aft gl 
< firl-it+ort+ flgl-it+ or (5) 


< {( i yr) +(/ er) (| srr) 6) 


And (3) follows. Note that equality in (3) forces equality in both (5) and (6). Thus we 
have fg > 0 and 


Alf |P ~ BigP ~ Cf + g|? 


(where “~” means “is proportional to”), which forces A’f = B’g for some nonnegative 


constants A’ and B’, not both zero. 
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We next, all too briefly, consider integral means. Given a positive, integrable weight 


function a(a), we define 


Myta) = (f £2)” a(e) ar) - 


for 0 < p< co and f > 0 (and measurable, say, or even continuous). We also define 


MG ae ees ( / aay eae) ir) 
and 


Mo (f) = sup [(@) (= eseup 1(2)): 


We could then develop the theory of the means M,(f,a) in complete analogy to our 


previous cases. We will, however, settle for a single observation: 


If M;(f,a) < co for some 0 < s < ov, then M,(f,a) < oo for all 0 < r < s and 


M,(f,a) < M;(f,a@), with equality if and only if f is constant. 


Proof. As before, the proof follows from Hélder’s inequality, applied to p= s/r > 1. 


([ra}"=(freraren)” 
r/s)(1/r) (1/r)-(1/s) 

< ( ‘ , /s)( ( 

ee 


This result should be compared with the non-weighted setting, the so-called L,-norms 


Ile = ( / f(z)" de) Pe dap és 


which are incomparable if J has infinite length and which otherwise satisfy 
[ed eS es eae b al 


if r < s and I = [a,b] (to see this, just set a = 1 in our previous calculation). 
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Problem Set 3A 


A few calculus problems 


1. Let 1 < p< o and let q satisfy 1/p+1/q = 1; in 


other words, (p — 1)(q —1) =1. By examining the 
graph of y = x?—1 (a.k.a, 2 = y%'), pictured at 
right, argue that 


a b 
ab < | Pda +f yt ‘dy, 
0 0 


with equality if and only if b = a?~! (equivalently, 
aP = b7). More generally, if y = f(x) and x = g(y) 


are strictly increasing, continuous, inverses of one 


another for x, y > 0, argue that 


ab < [1 dx + [ow dy, 


with equality if and only if b= f(a). 


. Establish the following (using, for example, the mean value theorem). 


(a) #/(14+ <2) < log(1+az) <a for x > —1, with equality (in either inequality) only 


at x = 0. 
(b) e*” > 1+ with equality only at = 0. Conclude that e* > (1+ (a/n))”. 
(c) Deduce that log(1 + x2) < #/(1— 2) for —1 <a <1, with equality only at x = 0. 
(d) Deduce that e? < 1/(1 — x) for x < 1, with equality only at x = 0. 
. Establish the following variations on Bernoulli’s inequality. 
(i) 1+2)* <1+az forz>-land0<a<l. 
(ii) (1+2)° >1+ a2 for x > —1 and either a <0 ora> 1. 


*<1-ag forO <2 <land0<a <1. [Hint: Write 1/(1-—<2) = 


(iv) (l—2)®* >1-—agz forO<a2<landa>1. [Hint: If 0 < x < 1/a, consider 
(1 —az)/o] 


4. Which is bigger, 7° or e™? 
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A few algebra problems 


10. 


11. 


12. 


b 
Show that "TVabn < im ~ for all n = 1,2,..., with equality if and only if a = b. 
n 
1 1 1 3 
(i) Show that Cie > — forx#>1. 


x-1l a «41 x 


(ii) Use (i) to show that the harmonic series }>°-_, + diverges. 
Without appealing to calculus, establish the following, where a, b, c > 0. 
(i) ab+be+ca<a?+0? +c? 
(ii) 9abe < (a+ b+ c)(ab + be + ca) 
(iii) abce(a +b +c) < a7b? + b?c? + ca? 
Show that 64 < (1+) (+=) (+5) forz,y,z>0,2+y+z=1. 
Show that a + 1/(ay) + y? > 5(274/5) for any x, y > 0. Find values of x and y that 
yield equality. 


Show that the rectangular box of volume V having minimum surface area S is a cube. 
[Hint: Show that V?/° < $/6, with equality occurring if and only if all three edges 
have equal length. ] 


Show that the right circular cylinder of volume V having least surface area S' has its 


diameter equal to its height. 


n n 1 
Given a1,...,@, > 0, show that (>: «) (>: | > n?, with equality if and only 


a: 
i=l i=1 ~? 
if aj =+++ =Qp. 
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Problem Set 3A, Problem 8 


1 1 1 
6. Show that 64 < (1+ 2) (1+) (1+2) forz,y,z>0,2¢+y+z=1. 
£ y z 


Solution. The function f(x) = log (1 aa +) = log(1 + x) — log = is strictly convex for x > 0 
because f "(x)= zz — qyaye > 0 for « > 0. Thus, f(x) + f(y) + f(z) > 3f (FAS) = 


x 


3f (4) = log(4?). Exponentiating, this becomes: 


(oto) Grau 


By strict convexity, equality can only occur if x = y = z = 1/3. 
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Problem Set 3B 


Notation: Fix n € N and a = (a1,...,Q@n), where a; > 0 for all i and a, +---+a, =1. 
For each nonzero real number ¢ and each x = (%1,...,%n) with 7; > 0, i = 1,...,n, we 
define the weighted mean M;(x,«) of order t by Mi(x,a) = (aya) +---anzxt,)!/*. In the 


remaining cases, t = 0,-too, we define Mo(z,a) = 2f'---29", M(#,a) = M(x) = 
max{21,...,2%n}, and M_..(%,a) = M_.(x) = min{z1,...,2n}, respectively. In the case 
Q, =-++: = Qn, = 1/n, we often omit reference to a and write M;(x), which is called the 


simple mean of order ft. 


1. Prove Hélder’s inequality in the case t = 1 (t/ = too) by showing M1(7)M_x(y) < 
Mi (xy) < Mi(x)Mo(y), where xy denotes the sequence (2141,.-.,;%nYn). When does 
equality occur (in either of these inequalities)? 

2. Prove Liapounov’s inequality: If 0 <r < s < t and if we write s = Ar + pt, where 
A, p> 0,A+p=1, then M,(x)> < M,(x)"M; (x). Equality can only occur if all 
of the x; are equal. Upon taking logarithms, Liapounov’s inequality tells us that the 
function s ++ slog M, is convex (for fixed x and a). 

Recall that we have also defined the sums S;(x) = (zi +---+a¢,)!/¢ for t > 0. Note that 
S,(z) = n'/*M;(x). A somewhat more common notation is ||x||, = (af + --- + a%)1/¢. 


These expressions are rarely used for t < 0, thus we are free to consider nonnegative 2;. 


3. If 0 < p < q, show that S,(x) < S,(x). Equality can only occur if all but one of 
the x; is zero. [Hint: The inequality is homogenous in x, thus we may assume that 
S,(x) = 1.] This is often called Jensen’s inequality. Please compare this result with 
the fact that M,(x) < M,(2). 

4. Show that limt_,.. S:(x) = M(x). Thus, we define $..(@) = Ma(2). 

5. Given x > 0, show that lim,_.9+(1 + 2°)!/¢ = +00. Conclude that lim ,_,9+ 5;(a) = 
+oo if x has two or more nonzero coordinates. 

6. Show that the sums S; satisfy Hélder’s inequality: If p > 1 and q satisfies 1/p+1/q = 1, 
then S;(xy) < S,(x)Sq(y). If0 < p < 1, the inequality reverses: S;(ay) > S,(x)Sq(y). 
In every case, equality can only occur if 2 and y are proportional. 

7. Show that S;(xy) < $1(x)S.(y) (this is Hélder’s inequality in the case p = 1, gq = ov). 
When does equality occur? 


8. Show that if p,q >r > 0 satisfy 1/p+1/q=1/r, then S,(xy) < S,(x)Sq(y). 


9. Given 0 < p<r <q, write r = Ap+ pq, where A, uw > 0, A+ wu = 1. Show that 


S,.(x)" < $,(x)PS,(x)%. Equality can only occur if all of the nonzero x; are equal. 


dl 


10. Show that the sums S; satisfy Minkowski’s inequality: For t > 1 we have S;(a+y) < 
Si(x) + Si (y), where x + y denotes the sequence (41 + y1,.--,2n+Yn). For0O <t< 1, 
we have S:(x + y) > S:(x) + S:(y). In every case, equality can only occur if x and y 


are proportional. (Of course, equality always holds if t = 1.) 


11. Show that S.(x+y) < Sx.(@) + Soo(y). When does equality occur in this case? 


For 1 < p < oo, the expression |||, = S,(a) defines a norm on R”. In other words, ||z||p 


satisfies: (i) ||a||, > 0 for all  € R” and ||z||,, = 0 if and only if x = 0; (ii) |laa||, = |a| |x|], 


for any scalar a € R; and (iii) ||z7+y||p < ||2||p + |lyllp for any 2, y € R” (which is typically 


called the triangle inequality in this context). For 0 < p < 1, (iii) reverses, thus ||z||p 


is not a norm in this case. However, as we'll see directly, for 0 < p < 1, the expression 


d(x, y) = || — y||P defines a translation invariant metric on R”. 


12. If p> 1, show that 7". (ait yi)? > LP + y?. If 0 <p < 1, the inequality 
reverses. If p # 1, equality can only occur if x;y; = 0 for all i (in other words, x; and 


y; cannot both be nonzero). 


13. For 0 < p < 1, show that the expression d(x, y) = ||z — y||f satisfies: (i) d(x,y) = 0 
for all x, y € R” and d(z,y) = 0 if and only if x = y; (ii) d(x, y) = d(y,zx) for all 
x, y € R"; (iii) d(z,y) = d(a — y,0) = d(x + z,y+4+ 2) for all x, y, z € R”; and 
(iv) d(a, y) < d(x, z)+ d(z,y) for all x, y, z € R”. 
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Stirling’s Formula 


Recall (from Problem 14, Problem Set 1) that (n/3)" < n! < (n/2)! for n > 6. Thus, it’s 
reasonable to ask whether n! ~ (n/a)” for some constant a (where ~ means that the ratio 
n!/(n/a)” — 1 as n — co). If you guessed that a = e, you’d be right—but the precise 


order of magnitude (including error estimates) takes a bit more work. 


We begin with a clever summation formula, due essentially to Euler. 


Euler-Maclaurin Summation. If f has a continuous derivative on |1,n], then 
dire) = fo tears f(x -[a)) Fe)ae + $0), 
k=1 


where |x] denotes the greatest integer < x. 


Proof. (If you happen to know Stieltjes integration, there’s a very short proof! For 
simplicity, we’ll settle for a slightly longer but still elementary proof.) We begin with the 


“error”: 
=i 


Ys) - [sea =. 
k=1 1 k 


=1 


k+1 
| (F(k) — f(e)) ae. 


(Please note that the upper limit on the sums is now n — 1.) We are going to integrate by 
parts on the right-hand side, using the variables u = f(k) — f(x) and v =a—k-—1 (!). 
With this choice we’ll have u(k) = 0 = v(k +1), which will simplify things considerably. 
Watch closely! 

n—-1 n m—1 pk+l 

dF) ai f(x)dz = ae (x —k—-1) f'(x) dx 


k=1 


n-1 


k+1 
= ae (x — [a] — 1) f'(x) dz 


k=1 
= eC —[x]) f!(x) de + f(1) — f(n). 


Adding f(n) to both sides completes the proof. 


Now Euler made another helpful observation: If we replace x — [x] by x — [x] — 1/2, 
we’ll enhance the likelihood of convergence of the second integral (as n — oo) because we’ll 


introduce the possibility of cancellations. 
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Corollary. f(k). = “fede f(x le) 1/2) Fw) de + [F0) + FO I/2. 


k=1 
"a — [a] —1/2 
Corollary. log(n!) = (n+ 1/2)logn —n 41+ f tote ne. 
1 
Proof. log(n!) = S “log k 
k=1 


"ae —1/2 1 
= / lords + [ EME yeep 
1 1 x 2 


= (n+ 1/2)logn —n 41+ f pa? ae. 
1 


In other words, we’ve just shown that 
len ” » — [x] —1/2 
Gn = log (a) = 1+ f ea a dx. (1) 
n 1 


We next show that (a,,) converges using a standard technique from advanced calculus. 


Dirichlet’s Test for Integrals. i f f(t) dt is bounded for x > 1 and if g(x) decreases 
1 


to zero as x — oo, then / f(x) g(x) dx exists (as an improper Riemann integral). 
1 


It’s not hard to see that 


i 
< [ @-e ~1/2) dr = 5. 


[@- [xz] — 1/2) dx 


x] 


[ (e-lel- 1/2) ae 


~~ x — [x] — 1/2 
Thus, i o> eet dz exists. This proves that (a,,) converges to a finite limit and, 
1 x 


hence, that b,, = e®” converges to a positive, finite limit C; that is, we’ve shown that 


; nl! e” : 
C = iim exists. 
n—0o mntl/2 
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This result is essentially due to DeMoivre in 1730. Finding the precise value of C’, together 


with some quantitative error estimates, will take a bit more work. 
The Gamma Function 
We begin with a continuous extension of n! initiated by an observation of Euler’s: 


1 ee) 
Wi i (—logx)"dx = i porter gh 
0 0 


This motivated Legendre to consider the function 


Pie) = | Pe de eS. 
0 
It’s not hard to see that I(x) < oo for all x > 0. Indeed, given x > 0, we can find 
to = to(x) sufficiently large so that t®~te-* < e~*/? for all t > to. Thus, 
to CO tz 
1 Gea | eats | et? dt = 24 2¢-*/? < a0, 

0 to x 
We next show that ['(a) is continuous—by showing that it’s convex! 
Theorem. (i) [(1) =1. 
(ii) D(a +1) =aI(ax) hence, T(n +1) =n! 
(iii) [ is log-convex; that is, log P(x) is convex. 


(iv) T is convex, hence continuous. 


Proof. (i) is clear: fj) e~* dt = 1. (ii) follows from integration by parts: 
Tfe+1) = i) eet at 
0 
| te-1 d(—e~*) 
0 
t=oo oS 
2G ee i: (—e~*)at® dt 
t=0 0 


a fe Gt Sag), 
0 
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To establish (iii), we’ll use Hélder’s inequality. Given A, w > 0 with \+ uw = 1, we have: 


T(r + py) = if emer wea 
0 


a [ete] [tet |" at 


< T(a)*P(y)*, 


and it follows that logT is convex. Finally, (iv) is a general principle: Every log-convex 


function is also convex. This follows from Young’s inequality: 


Te + py) < Te)PT(y)* < AP(2) + Ty). 


Corollary. As x — 0+ we have zI(x) — 1 and, hence, T(x) > +00. 


. 
Proof. «I (x) =T(a#+1) -T(1) =1as 2 — 0°. It follows that I(x) = ane) 
a 


— +00 as 


z—O0r. 


We next prove a remarkable characterization due to Artin in 1964. 


Theorem. Suppose that f : (0,00) — (0,00) satisfies (i) f(1) = 1; (ii) f(@ +1) =af(2); 


and (iii) f is log-convex. Then f =T. 


Proof. Of course, by (i) and (ii), f(n +1) = n! for n = 0,1,2,.... Next, given0< a2 <1, 


we use (ii) and (iii) to estimate f(n + 1+ <2). Watch closely! 


f(nt+1+2) f(ntlt+a2-—nzr—x+nzr4+2) 


f(Q-—2)(n4+1) + a(n +2)) 
< f(m +1)" f(n+2)* 

= f(ntl)*[(nt+)f(n+))]° 
= (n4+1)* f(n4+]1) 

= (n+1)*n! 
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Also, 
n! = f(n+1) = f(a(n+2z)+(1—2)(n+1+42)) 
< f(nta2)* fintl+a)* 
= (nt+a2)" f(nt+1+2)* f(ntit+saz)'* 
= (n+2)* f(n+142). 
That is, f(a +14+<2) > (n+2)* nl. 
Now we use (ii) to write these inequalities in terms of f(x). 


f(n+1+2) 


(n+ 2) f(n+2) 


= (n+2)(n—1+4+2)f(n-—1+2) 


= (n+2)(n—1+2)---2 f(z). 
Thus, 
(nta)*n! < (n+a)(n-—1+2)---xf(x) < (n4+1)*n! 


or, after dividing by n* n!, 


(1+) < Ste Ce (i++). 


n n= n! 


For fixed x, both extremes tend to 1 as n — ov, so we’ve shown that 


zy! 


f(z) = ee (nta)(n-—1+2)---2 


for 0 <a <1. We next show that (2) holds for x > 1 as well. 


Given x > 1, there is a positive integer m with 0 < x«—m <1. Then 


f(z) = (@©—1)(@—- 2)---(@—m) f(@ — m) 


n* n! _(n+2)---(n+2+(m—1)) 
noo (N+ 2)--°x nm 


eee 
n—00 (n+ a2)--+@ 


because the second factor in (3) consists of m terms each, top and bottom, and m and x 


are fixed; thus, the limit of this factor is 1 as n — oo. Thus, (2) holds for all ¢ > 0. In 


other words, there can be only one function satisfying the hypotheses of the theorem. 


x 


n® n! 
Corollary. [ =o ; 
oro) m0 (n+ta)(n—1+2)---2 


1/2) 
In particular, [(1/2) = Jim, Ga a — 73) (13) But we can also find this 


limit by alternate means: 
NL{2) = | t Ve! dt = 2 | e* d(t!/?) = 2 | en du= VT. 
0 0 0 


Return to Stirling’s Formula 


n 


If you'll recall, we’ve shown that 6, = ee Thus, 
mnrt+l/2 
2 1)2 ,2n 2n+1/2 
Be _ (n!)“e™ (2n) a. 
bon nent (2a) ler” 
But 
227 n! Qh al 
(2n)! —— (2n)(2n — 1)(2n — 2)---3-2-1 
n! 
n(n —1/2)(n — 1)--- (8/2) -1- (1/2) 
a 1 
(n — 1/2)(n — 3/2) --- (3/2) - (1/2) 
So, 
b? nil? nl 


~ ntil/2- _ ed Se 
bon (nn +1/2)(n — 1/2) +++ (3/2) - (1/2) am V2 V2T (1/2) J2r. 


Thus, we’ve finally arrived at Stirling’s Formula. 


! n 
ti es oe 


n— oo nnrt+l/2 


or, in the notation from the beginning of this section, n! ~ /2m-n"*1/2..e-”. 
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Stirling’s Series 


Finally, let’s talk about error estimates. In addition to the qualitative statement that 


n! e” 
2m - gyntl/2 


Stirling provided quantitative estimates on 


—-1las n-o, 


He showed that 


for certain constants A, B, etc., where ~ means that r, lies between successive partial 
sums of the (divergent) alternating series (called Stirling’s series). For example, 
A 
—-—= <m< — (pyre? Sakic) 
n 


for suitable A and B. We’ll settle for the modest estimate: 
Lemma. 0 <1, < 1/12n for all n. The constant 1/12 cannot be improved. 


Proof. The proof makes repeated use of the fact that if a sequence (or function) decreases 
to 0 as n — oo (resp., x — oo), then it must be positive. Similarly, if a sequence (or 


function) increases to 0, then it must be negative. 


In order to show that r,, > 0, then, it suffices to show that (r,,) is decreasing (because 


we already know that r, — 0). Thus we want to show that 


, (n+1)len*t V2n -nrtt/2 
Oo . 
ag (n+ 1)rti+1/2 nie” 


Tntl—Tn = 


1 
1— (n+ 1/2) log (1+) < 0 
for all n. For this, it suffices to show that 
] 1+ : : U0 
ne n n+1/2 
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for all n. To this end, notice that the function g(x) = log(1 + 1/x) —1/(~ + 1/2) tends to 


0 as x — oo. If we can show that g is decreasing, we'll know that g(x) > 0. But 


He) 1 1 1 er 
OG — — = — 
: (c+1/2)? a(at+1) wt+at+ 1/4 2+e 


for x > 0. Thus we’ve shown that (r,,) decreases to 0 and, hence, that r,, > 0 for all n. 


We next find a constant A > 0 so that r, < A/n for all n. For this, it suffices to show 
that an = Tn — (A/n) < 0 for all n. But a, — 0, so it suffices to find A such that (a,,) is 


increasing. That is, we want 


1 1 1 
nt1—-@Qn = 1- 1/2)1 1+—- A{—-— > 0 
ener (n+ 1/2)10g (14 =) + ( =i) 


for all n. This will be the case if 


1 1 
1 1 1 
Sg | a |) 
g(@) = Tap og ( +2)4 aa 
for all x > 0. Again, we know that g(x) — 0 as x — ov, so it would suffice to find A so 
that g'(x) < 0 for all x > 0. Hang on! 
2 Hi cat 2A 
— — 1] 
9) 22+1 os ( xe ) se x(x + 1)(2z + 1) 


x(x+1) — A(12x? + 12x + 2) 
x?(a2 + 1)2(22 +1)? 


= qe) = 


Thus it suffices to find A such that 
x(x +1) 


A =i 0. 4 
a ema. ee 4) 
But h(x) — 1/12 as  — oo and 
2 1 
ney oa + 0 


2(6x7 + 62 + 1)? 
for x > 0. That is, h(x) strictly increases to 1/12 as x > o0, so A = 1/12 is the smallest 
possible upper bound in (4). Because this is such a roundabout proof, let’s summarize what 


we’ve done: A= 1/12 = > g' <0 = g>O => (a,) increasing = a, <0 => 


Tm, <1/12n. Phew!! 


Ion 
Conclusion. For all n = 1,2,3,..., we have 1 < a eee 


\/ 27 - nntil/2 


AO 


Problem Set 4 


Notation: A positive function f : I — (0,00) defined on an interval I is said to be log- 


convex if log f is convex. Equivalently, f is log-convex if f(Av + py) < LF(a)|* LF)" 


whenever x, y € I, A, w > 0, A+- 4 =1. From Young’s inequality, a log-convex function is 


also convex; thus, the log-convex functions form a subset of the convex functions. 


1. 


10. 


(i) If a, b > 0, show that f(x) =a- b® is log-convex on R. 


(ii) Show that g(x) = x is convex but not log-convex on (0, 00). 
If f and g are log-convex, show that fg and f + g are log-convex. 
For x and a fixed, show that the function f(t) = M;(2, a)" is log-convex for t > 0. 


me T(is(n+1 2 
Use the log-convexity of I'(a) to show that < Lal )) < V2 for 
n+1 P(5(n + 2)) n 
n=1,2,3,.... 


Evaluate [ pret dt. 
0 


1 2n)! 
Show that (n+ )- Ue for = 0s ose 


2 nl 4r 


2) g2n 
Show that ( . ~ . [Recall that a, ~ 6, means that a,/b, — 1.] 
n Jnr 


ee ee (2n)(2n) 
y) ol = ] < oles: * 
Establish Wallis’s formula ee 3°3°5 (2n — 1)(2n + 1) 


[Hint: 1/2 = (1/2)(P(1/2))2.] 


1 
Prove Legendre’s duplication formula: T (=) rc (= ) = — T(z). 


[Hint: Show that f(x) = (2771!/\/r)[(x/2)I'((x + 1)/2) satisfies the hypotheses of 


Artin’s theorem.| 
x—1 


Consid = 
onsider f(x) i ie 


a) Use the fact that (1+ t)~! = [~° e~ C+ ds to show that f(x) =IT(x)I'(1— 2). 
0 


dt for 0 < a < 1. [Euler showed that f(x) = 7/sin(rz).] 


(b) Show that f is log-convex on (0,1). 
(c) Conclude that 7 = f(1/2) < f(x) for allO<a2<1. 
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Inner Product Spaces 


An inner product space is a vector space X over C (or R) together with a scalar-valued 


function (x,y) of two variables, called an inner product, satisfying: 


(i) (,-):X x X 3C (orR) 


(ii) (x,y) = (y,@) for x, ye X 
(iii) (ax + By,z) =alz,z)+ Bly,z) for z,y,zE€X anda, BEC 


From (ii), the inner product isn’t fully linear in its second coordinate, so we sometimes 


say that (-,-) is sesquilinear (or one-and-a-half-linear). 
(iv) (2,2) > 0 for any x € X; (x,x) = Oif and only if x = 0. 


Note that by linearity we have (z,0) = 0 = (0,2) for any x € X. Moreover, from 
(iii), if (2, y) =0 for all y € X, then x = 0. 


Examples 
1. C” with its usual inner product (2,y) = 9! x = 0p) LkVq- 


2. Given continuous functions f, g : [a,b] — C, the expression ( f,g) = fe f(x) g(x) dx 


defines an inner product. Given a strictly positive weight function w : [a,b] — R, we 


might also consider ( f,g) = ie f(x) g(x) w(x) dx. 


In the remainder of this section, we’ll assume that X is an inner product space over 


C. As we’ll see directly, an inner product induces a norm on X by setting 


lal] = (2,2). (1) 


In term of our first two examples we have 


er 1/2 
|x|] = (>: in? on C" 
k=1 


and 
1/2 


b Bee b 
fll = ( / ey? ae] or ( | F@roe is) on Cla,b). 
A key step in verifying that equation (1) defines a norm is given by: 
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The Cauchy-Schwarz Inequality. In any inner product space, | (x, y)| < ||x|| ||y||, with 


equality if and only if x and y are linearly dependent (that is, parallel). 


In terms of our initial examples, 


. 2 1/2 pe 1/2 
S- ceUe s (>: in? (>: nt) 


< ( / joe) ( i a 


Proof. By our earlier remarks, we may assume that x, y 4 0. Now, given a € C, note 


and 


[ ener 


that 
0 < |x — ay||? = (2 — ay, 2 — ay) 


= |le||? — a(a,y) — aa, y) + lol? lal? 
= |le||? — 2Re(@(2,y)) + lal*llyll?. 


In particular, setting a =(2,y)/(y,y) =(2,y)/lyl|?, leads to 


May)? Way)? zy)? 
0<|le||* — 2 a lal Sell =a 


lly? llyll* lly? 


We leave the case for equality as an exercise (just examine the proof closely). 


[Re(zy)l - M@9) 2 Tins 


It follows from the Cauchy-Schwarz inequality that 


lel Myth Wael yl 
we can (and will!) define the angle 0 between (nonzero) vectors x and y by declaring 
R 
cos 8 = RE) 
Il"|| II 


(This uniquely defines 0 if we insist that 0 < 6 < 7.) In particular, we say that x and y 
are orthogonal if (x,y) = 0. We will sometimes use the shorthand x | y to denote that 
x and y are orthogonal. From our earlier remarks, every vector is orthogonal to the zero 


vector (and the zero vector is the only vector with this property). 
The observation that 
Ie t+yll? = lal]? +2Re(2,y) + llyll? (2) 
(which is the law of cosines in disguise!) leads to two important results. 
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The Pythagorean Theorem. ||x + y||? = |{x||? + ||y||? if and only if (x,y) =0. 
The Triangle Inequality. ||x + y|| < ||z|| + |ly|l. 


Proof. From (2) and the Cauchy-Schwarz inequality, 
lla + yl? < |lell? + 2Ka,y)| + My? 


< |lell? + lll ilyll + llyll? 


2 


= (lel + Ilyll) 


Corollary. ||z|| = ,/(xz,x) defines a norm on X. 


At the risk of a bit of repetition, let’s take another look at the proof of the Cauchy- 


Schwarz inequality. 


Lemma. Let x, y © X with y £0 and let a= (2,y)/(y,y). Then: 


(i) (c —ay) L y and, of course, x = (x — ay) + ay. Thus, x can be written as the sum 


of a vector orthogonal to y and a vector parallel to y. 
(ii) \|a||? = lla — ay||? + |a|?||y||?. In particular, note that ||z|| > ||z — ay]. 


(iii) ||z — ay|| < ||z — Gy|| for all B 4 a. Thus, y* = ay is the unique point in span y 


nearest to x; it is characterized by the requirement that (a — y*) L spany. 


Proof. (i) follows by design; that is, a is chosen to satisfy («—ay, y) = 0. (ii) follows easily 
from (i) and the Pythagorean theorem. Likewise, (iii) follows from (i) and the Pythagorean 
theorem. Indeed, from (i) and the linearity of the inner product, it follows that x — ay is 
perpendicular to every multiple of y; thus, given 3 € C, we have 
lla — Byl|? = ||(e — ay) + (6 — a)y||? 
= |x — ay||? + [6 — a? Ily|I? 


> |le — ay|l? 


unless 3 = a. 
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The vector y* = ay in our previous Lemma is called the orthogonal projection of x 
along y; it is the shadow of x onto span y along a perpendicular “line of sight.” In calculus, 


it is often called the component of x in the direction of y. 


As we'll see directly, if we inductively apply the Lemma to a basis for a (finite- 
dimensional) subspace Y of X, we’ll arrive at a technique for building an orthogonal basis 
(that is, a basis consisting of mutually orthogonal vectors). The benefit in having an 


orthogonal basis is illustrated by our next result. 


Lemma. Let {e1,...,¢n} be a set of nonzero, mutually orthogonal vectors in X and let 
E =span{e,...,€,}. Then 


(2,€3) 2 
( €4,€: ) Ci 


(i) {e1,...,en,} is linearly independent; in fact, z € E if and only if z = \7\_, 
Thus, {€1,...,@€n} is a basis for E. 


(2,e; ) 


(ii) Given any x € X, the vector x — >", (ener) &i is orthogonal to E. 


iii) e* = 0"_, 4) ¢. is the unique nearest point to x in E. 
J=1 (ej,e;) J 


(iv) x € E if and only if |la|)?2 = c"_, M2 = Jer |/2. 


j=l (€5,€; ) 


Proof. To begin, if we set z = >\"_, axe;, then 


n 
(Zen =o Oi eieu = i eats), 
w=1 


n (@,€% ) 


i=1 Torey Ci. A similar calculation 
a ( €4;€% ) 


which proves (i). Next, given x € X, set y=a—)> 
then yields 

(yee) = (x, ex) — (a, en) =0, 
and it follows that y is orthogonal to every vector in E. Parts (iii) and (iv) are now almost 


immediate. Indeed, from (ii), « — e* is orthogonal to EF. Thus, given any vector e € E we 


have e — e* € E and, hence, 
llz — ell? = || — e*) + (e—e*) ||? = lz — e*||? + lle —e* |]? > lla —e* |’, 
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unless e = e*. Finally, x € F if and only if x = e*. But, from (ii), we have ||x||? = 


n [(aes)/? 
j=l (5 ,€; ) : 


|| — e*||? + |le*||?.. Thus, 2 = e* if and only if ||x||? = |le*||? = > 


The vector e* of our previous Lemma is the orthogonal projection of x onto E. Again, 


it is characterized by the requirement that (a — e*) L FE (as in (ii) of the Lemma). 
Again, let’s pause to examine our first two examples. 
Examples 


1. The usual basis e, = (0,...,0,1,0,...,0), where the single nonzero entry is in the 
k-th coordinate, is an orthonormal basis for C”. That is, the ez are not only mutually 
orthogonal ((e;,e;) = 0 for i € 7) but also norm one ((e;,e;) = 1 for all j). Note 


that every vector z = (21,...,%n) = )op_1( 2, ex) ex has norm ||x||? = oy_, |zxl?- 


2. Relative to the inner product ( f,g) = x i f (x) g(x) dz, the vectors {1, e’”,...,e’”*} 


are orthonormal. Indeed, 


27 20 : 
| eime eint dr = | eilm—n)ax ee { 0, if m of n 
0 0 


2, aap ns 


In the case of real scalars, real-valued functions, and the inner product (f,g) = 


1 20 


=Jo f(%) g(x) da, the vectors {1/\/2, cosa, sinz,...,cosnx,sinnx} are orthonormal. 


We next show how the ideas developed in our first two lemmas can be used to construct 
orthogonal sequences. 
The Gram-Schmidt Process. Let E be a subspace of X with basis {x1,...,%»}. Then 
we can find an orthogonal basis {e,,...,¢n} for EF satisfying 
(i) span{e,,...,e,} =span{z,...,2,} fork =1,...,n, and 
(ii) 0 < |lex|| < ||zx|| fork =1,...,n. 


(#2,€1 ) 
(€1,€1 ) 


Proof. To begin, set ey = x1 # 0 and let eg = x2 — e,. Because x2 ¢ span{e;} = 
span{z;}, we have eg # 0. Of course, eg € span{e;,22} = span{x1,22} and so we 


have span{e1,e2} C span{x1, 22}. But, from the previous Lemma, e2 is orthogonal to 
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span{e,;} = span{2,}. In particular, e; and eg are linearly independent and it follows that 


span{e,,e2} = span{z,,x2}. To see that e2 satisfies (ii), notice that rz = e2 + ae, and, 


hence, ||x2||? = |le2||* + |a|*|Je1||? = |leal|?. 

Continue by induction: Assuming that {e1,...,e,} have been chosen, set ex41 = 
Let — er ae e;. Because ry41 ¢ span{x1,...,%,} = span{e1,...,ex}, we have 
€x41 # 0. Also, span{ej,...,e%,en41} C span{%1,...,2%,£x41}. But, as before, ex,44 
is orthogonal to span{e1,...,e,} and, hence, {e1,...,ex,ex41} are linearly independent. 
Thus we must have span{e1,...,e%,eg41} = span{71,...,@%,@e41}. Finally, |lvp41||? = 


llensall? + D251 legl*llesll? = llewsall?. 


Clearly, once we have an orthogonal basis, we simply normalize to arrive at an or- 
thonormal basis (alternatively, by slightly altering the process outlined above, we could 


construct the vectors e, to have norm one). 


Corollary. If E is a finite-dimensional subspace of X, then E’ has an orthonormal basis 


{e1, wee 2en bs 
All of these ideas make very short work of: 
The Projection Theorem. Let EF be asubspace of X with orthonormal basis {e1,...,€,}. 
Define P : X — X by P(x) = >>y_1 (2, en) ex for x € X. Then 
(i) P(w) € E and (w— P(a)) L E; hence, ||x||? = ||e — P(x)||? + ||P(2)|I?. 
(ii) P(a) is the unique nearest point to x in E. 
(iii) P is a projection; that is, P? = P. 
(iv) P is linear, continuous, and satisfies ||P(x)|| < ||x|| for alla € X. 


P is called the orthogonal projection onto E or, sometimes, the nearest point map on 


E. It follows from (ii) that P is actually independent of the choice of basis. 
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Hadamard’s Inequality 


As an application of the ideas in this section, we next present a classical matrix inequality 


due to Hadamard in 1893. 


Theorem. Let A = |[a;;| be a real or complex n x n matrix and let A; = (a;;)?_, denote 
the j-th column of A. Then 


wal s [lad = T(Shur) 


j=l j=l 


Equality can only occur if one of the columns A; = 0 or if the columns are orthogonal; 


ie., (A;, Ax) =0 for j Fk. 


It is well known that |det A| represents the volume of the parallelepiped with edges 
(A,;). Thus, Hadamard’s inequality states that the volume is maximal when the edges are 


orthogonal. 

Proof. We may certainly suppose that det A # 0, in which case the columns (A;) are 
linearly independent. Thus we may apply the Gram-Schmidt process to orthogonalize 
them, arriving at vectors (b;) satisfying b) = Aj, 

b; = Aj— LAs ba) 4, (G22); (3) 
and ||b;|| < ||A,|| for all 7. In particular, the matrix B = [b) ---b,,] with columns (b;) can 
be obtained from A by elementary column operations. It follows that det B = det A. 

But the columns of B are orthogonal, so 
B*B = diag(bull2,.--+llbnll?) 


and, hence, det(B*B) = TTj-1 ||; ||?. On the other hand, det(B*B) = det(B’) det B = 


| det B |?. Consequently, 


|det A] = |detB| = (cet(B*B)) =I loll < Tp WAsl. 
j=l j=l 
Equality can only occur if ||b;|| = ||A,|| for all 7, which can only occur if b; = A, for all J; 


that is, if and only if the (A;) were already orthogonal. 
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Corollary. Suppose that A is an n x n matrix with real or complex entries satisfying 
|a;;| <1 for alli, 7. Then |det A| < n”/? with equality if and only if |a;;| =1 for alli, j 


and the columns of A are orthogonal. 


Proof. In the notation of the previous Theorem, we have ||A,;|| < \/n, thus | det A| < n"/?. 


Equality would mean that the A; are orthogonal and it would also mean that ||A,|| = /n, 


which forces |a;;| = 1 for all 7, 7. 


An n x n real matrix having entries aj; = +1 and orthogonal columns is called a 


1 
1 


happens, it’s very easy to construct Hadamard matrices of order 2”. (And not quite so 


Hadamard matrix. For example, A = a | is a 2 x 2 Hadamard matrix. As it 


easy—nor, indeed, always possible—to construct Hadamard matrices of other orders. For 
example, there is no Hadamard matrix of order 3.) The key is the so-called Kronecker 


product, which we will define by example: The Kronecker product of A with itself is 


i} 1 i 1 


1-A pas I: Si he Nes a 


1-A -1-A iL kk Sl, Gel 
Lee ad iL 


which is a 4 x 4 Hadamard matrix. The product A®B would then yield an 8 x 8 Hadamard 


Aga = | 


matrix, and so on. 


A9 


Problem Set 5 


Throughout, X denotes a finite-dimensional inner product space over C with inner product 


(-,-) and with associated norm ||z|| = \/(2,2). 


a 
2. 


Show that |(x,y)| = ||2'| |ly|| if and only if x and y are linearly dependent. 


Show that ||a + y|| = |||] + ||y|| if and only if one of x or y is a nonnegative multiple 
of the other. 


Show that the parallelogram law: ||x +-y\l? + |l@—y||? = 2(||a||? + lly||?) holds for any 
eye X. 

Show that the polarization identity: 4(x,y) = ||a+y||?—||a—y||?+||a+<y||?—<||2—iy||? 
holds for any x, ye X. 

Show that a linear map 7’: X — Y between inner product spaces X and Y is an 
isometry (into) if and only if it preserves inner products. That is, ||Tx|| = ||a|| for 
all x € X if and only if (Tz,Ty) = (z,y) for all x, y € X. In short, a linear map 


preserves distances if and only if it preserves angles. 


Let E be a subspace of X and let ro € X. Show that the nearest point to x in E 
can be characterized as the (unique) point yo € E satisfying Re (20 — yo, y — yo) < 0 
for all y E X. 


Consider C|—1, 1] with the inner product ( f,g) = vie f(x) g() dx (where we consider 


real-valued functions and real scalars). 


(i) Apply the Gram-Schmidt process to the linearly independent set {1, x, x7} to find 


the first three Legendre polynomials P(x) = 1, P2(x) = x, and P3(x) = a? — §. 


(ii) Compute min Vo | e* — (ax? + bx + ¢) |’ dx :a, b, ¢ real}. 


Given a subset A of X, let At = {x € X : (z,a) =0 for all x € A}. The set AX is 
called the orthogonal complement of A. Verify the following for subsets A, B of X. 


(a) A+ is a subspace of X and AN At c {O}. 
(ACB 2 APS 8 and Ae BS AS SBS, 


(c) A c At++ and, hence, span(A) Cc At+. (In fact, for X finite-dimensional, we 
actually have span(A) = At+.) 


Let EF be a subspace of X and let P: X — X be the orthogonal projection onto E. 
Show that I — P is the orthogonal projection onto E+ (where I denotes the identity 
map on X). 
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10. 


11. 


12. 


13. 


Let A be an m x n matrix over C and let A* = A” denote the conjugate transpose 
of A. Given x € C” and y € C”, show that (Az,y) = (x, A*y). Moreover, this 
equation characterizes A*; that is, if B is an n x m matrix over C which satisfies 
(Az,y) = (x2, By) for all x, y € X, show that B = A*. 


Let A be an m x n matrix over C. 
(a) Prove that (range A)+ = ker(A*). 
(b) If b€ C™ is such that the equation Ax = b has no solution, we can always find a 


vector 29 € C” that minimizes ||b — Az|| over all « € C”. Prove that 29 satisfies 
(b — Azo) L range A. 


(c) Prove that xg satisfies the so-called normal equation: A* Ax = A*b. (Note that 
A*A is Hermitian. If A has rank n, then A*A will be invertible and we can use 
it to solve for x9 = (A*A)~1A*0.) 


Let M,,(C) denote the collection of all n x n matrices with complex entries. M,,(C) 
is a vector space over C under “coordinatewise” addition and scalar multiplication. 
Show that the expression (A, B) = trace(B* A) defines an inner product on M,,(C). 


Let A € M,,(C). 


(a) Show that [z,y] = (Az,y) defines an inner product on C” if and only if A 


satisfies: 
(i) A* = A. (We say that A is Hermitian or self-adjoint if A* = A.) 


(ii) (Ax,x) > 0 for any x, and ( Az, x) = 0 only for x = 0. (In other words, 
the quadratic form f(x,y) = (Az,y) is positive definite; this implies 
that A has strictly positive [real] eigenvalues. ) 
(b) Conversely, show that any sesquilinear, positive definite quadratic form on C” is 


of the form ( Ax,y) for some Hermitian matrix A. 
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Hilbert’s Double Series Theorem 


Theorem. If (a,,) and (b,) are real, square-summable sequences, then 


As ae 1/2 
Same cw(Sek) (Sa) a) 


unless one of (ay) or (bm) is identically zero. The constant 7 is best possible. 


It’s unusual to see a strict inequality with a sharp bound! Hilbert proved this in- 
equality, with constant 27, in his famous series of lectures on integral equations (roughly, 
1904-1911); his result was first published by Weyl in 1908. The best constant, together 
with various generalizations, was provided by Schur in 1911. We will present two proofs, 


along with a generalization to integral inequalities. 


Our first approach uses the Cauchy-Schwarz inequality in the form: 


1/2 1/2 
ae (x) (ss) | @) 


kel kel kel 


where (x,) and (yz) are real sequences and where J is a countable set; in this case, 


This is a relatively straightforward generalization of the R” version because 


a: = wo| Stes rie b 


kel ke J 


That is, (2) follows from the finite-sum version of Cauchy-Schwarz by showing that all 


(finite length) partial sums satisfy (2). 


The idea behind our first proof of Hilbert’s theorem is to take advantage of the sym- 


metry of the summand, relative to m and n. We begin by rewriting: 


Ambn Am m\>> bn n\r* 
mtn ea (a) ea (a) 


52 


where, if possible, will be chosen so that each of the factors on the right is square- 


summable (over the index set J). Applying Cauchy-Schwarz we get: 


=. 6G: | 


mn mn m,n 


But 


a2, m\2r> eed sa 1 mm) 2 
Pees) - 20 Ga a) 


and, similarly, 


b2 n \ 2A = = 1 n \ 2A 
Dea a) = oe a ; 


Thus, we’ll have a proof of Hilbert’s inequality (with some constant) if we can find a finite 


bound By such that 


Sees 1 2X 
> (=) < By 
(m+n nr 


for all m; that is, By, may depend on X, but not on m. 


2x 
Now for \ > 0 (and any m), the sequence ain ()" decreases as n increases, so we 


can appeal to the integral test: 


Lemma. If f : [0,0o) — [0,00) is strictly decreasing, then 


[ teva < Yen < [O tea. 


Returning to our search for the bound B), we find 


CO 


1 m\ 24 ee alk m\ 2A re alk 1 
en ae ee Sa, 
Sy ee N 9 mt+zr\er oo Leary 


As we’ve seen (in a slightly different form), this integral exists provided that 0 < 2A <1 
and, in fact, equals [(2A)T(1 — 2A); that is, we can take By = [P(2A)C(1 — 2X). In other 
words, we’ve just shown that 


, a i ae ee 1/2 
Am In 
waa < a( Sat) (Som) 


min 
But we’ve also seen that the minimum value of B, occurs when = 1/4, in which case 


By 4 = (P(1/2)? = a. This proves Hilbert’s theorem, save the claim that 7 is best possible. 
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Corollary. Let (a,,) and (b,) be nonnegative, let 1 < p < oo, and let q satisfy : + =" 


Then 


lo one e) 1/p ioe) 1/q 
ya < rary (ds) (So) 


unless one of (ay) or (bm) is identically zero. 


Proof. In this case, we write 


1/ 1/ 
es 7 = (5)" eG | 


and apply Holder’s inequality, which will lead to the bounds 


me aly, = dey a 
[ya ee Poerayg and foe ty = Paar) 


As it happens, [(x)[(1 — x) = m/(sinrax) for 0 < x < 1 and so the bound in the 


Corollary can be written as 7/(sin(a/p)), which is actually best possible. 


The integral version of Hilbert’s inequality can be proved in essentially the same way 


as the discrete version and, in fact, will yield the discrete version as a corollary. 


Theorem. Let f, g: [0,00) — [0,0o), let 1 < p< ~, and let : + 5 = 1. Then 


[f SiP ee < sam (sore) “([aera) ” 


unless one of f or g is identically zero. The constant 1/ sin(a/p) is best possible. 


In this case we would write 


1 1 
it (2) /P 1 y /P 
r+y\y r+y\a 


and proceed as before. Instead of completing this calculation, let’s opt for a slightly more 


1/q 
1 


aty 


general theorem: 
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Theorem. Let K : [0,00) x [0,c0) — [0,00) satisfy K(Ax,Ay) = \7'K(a,y) for all 


A > 0. Then, for f, g, and p as above, 


ian K(a,y) fx) gy) dedy < C (f f(a)P i) : f a ww) - 


The constant C' is given by the common value 


=) K(a,1)aVPae = | K(1,y) yo 4 dy. (5) 
0 0 


If K > 0, then the inequality is strict unless one of f or g is identically zero. 


Proof. We begin with the change of variable y = ux (and a few changes in the order of 


integration). 


r= [0 [ Kew te Jaluazay= fF w)| [x Keays (y) dy) ae 
z [ f(a) I PAG Ga in| re 
=[ te | fr K(1,u) glu) da] dx 
=f Kaul fo iG ) (ur) de] ds 


We now apply Holder’s inequality to the inner integral, using the same change of variable 


“Z (2) gu) yar s([ Uteyrae) "ve (f™ corrav) 
rs [Ka uvedu(f eyPae) : ([ att" av) a 


The case for strict inequality in (4) follows from a careful examination of the case for 


Y=U 


Thus, 


equality in Hélder’s inequality (which we will forego). The fact that the two integrals in 


(5) are equal is left as an exercise. 


Now the integral version of Hilbert’s inequality can be used to deduce the discrete 


version; indeed, because we have 


Te a dydx il 
,oty omen 
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we could define f(x) = am for m—1 <a <™m, g(y) = bn, n-—1< y < n, and write 


the double sum in (1) as a double integral over [0,0o)?. 


But we can actually do a bit 
better, arriving at an even sharper version of (1). To see this, note that the function 
h(a) =(m+n—1+a)71 is strictly convex for —1 < a < 1; thus, 


1 1 2 
> : 
mt+tn-l-a m+n-l1l+a m+n—-1 


Now watch closely! 


[- oe dy dx -f- i dy dx -f- ie dy dx 
,xu+y Sta) MERA Lay yaJajmtn-—l-s-y 


(Where we first exchanged x and y for r+ m-—1/2 and y+n—1/2, then exchanged x and 


y for —z and —y.) All three integrals are equal and the second two average to be strictly 


bigger than (m+n —1)~1; thus we have 


[- ae dy dx 1 
nitty m+n—1 


Replacing m and n by m+ 1 and n+ 1 leads to the following sharper version of Hilbert’s 


inequality. 
Corollary. If (a,,) and (b,,) are real, square-summable sequences, then 
os V/ DP ops 1/2 
exits < a(S) (Sa) 
m=0n= ere <*( n=0 
unless one of (ay) or (bm) is identically zero. 


We now apply our Corollary to the moment sequence of a square-integrable function. 
For this we'll find it helpful to have a converse to Holder’s inequality, a result that’s useful 


in its own right. 


Lemma. Let 1 < p < o and let ; + ; = 1. Suppose we’re given a real or complex 


sequence (a,,) and a positive constant C' that satisfy: 


oo ee) 1/q 
Sapa © (> ont 
n=1 n=1 
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whenever (b,,) is a q-th power summable sequence. Then (a,) is p-th power summable 


and, moreover, )°°~_, |an|P < C?. 


Proof. It suffices to show that eae lan|? < C? for all N. But setting b, = |an|?~'senan, 


for n =1,...,.N, and b,, = 0 otherwise, we have |b,,|? = |a,|”, for n =1,...,N. Thus, 


N N N 1/q 
nba = Yon <C (9o bat | 
n=1 n=1 n=1 


a an N are N 1/p 
Dividing by eae lanl”) (which we may suppose is nonzero), we get oe lan|”) A 
G. 


Corollary. Let f :|0,1]— R be square-integrable and nonzero. For each n = 0,1, 2,..., 


define an = Ab x" f(x) dx. Then 


The constant 7 is best possible. 


Proof. We may suppose that f > 0. Indeed, 


1 
janl < f a|f@)|dx =, 
0 
(the moment sequence for |f|), so it would suffice to consider |f]. 


Now if (b,,) is any square-summable sequence, then, for all N, we have 


N N 1 1 N 
pon = Dy Pita) r= f(a) bye” de 
; 1/2 1/N 2 we 
— (| f(2)? de) / (> be) da 


-(f' roa)” (Soy ta" 


m=0 n=0 


From the converse to Hélder’s inequality (or, in this case, the Cauchy-Schwarz inequality), 
it follows that (a,,) is square-summable and satisfies }> a2 < 7 ie f(x)? dx. The fact 


that m is the best constant follows from considering the function f(x) = (1—x)~¢7? (and 


letting « — 0), a calculation we will omit. 


Our second proof of Hilbert’s inequality has the advantage of being entirely elementary, 
but has the disadvantage of giving a slightly weaker result (given our present state of 


knowledge). The proof, due to Toeplitz in 1910, begins with a simple observation. 
i 27 : if 

Lemma. x | (t—n)e™ dt=— forneZ,nF0. 
27 Jo n 


Proof. Integrate by parts: 


. 20 . On 
1 , 1 er ie 1 

xf e-memar= = femal Dem) = e-mem] = -. 
27 Jo 27 Jo in 2mn 0 n 


This simple observation gives us an integral representation for the left-hand side of 


Hilbert’s inequality. 


N oN a 9 Woe Z 
Oey, ( Fe 
Lemma. y y ae tne f (t — m) f(t) g(t) dt, where f(t) = y ame" and 
m=l1n=1 m=1 


Finally, we apply the Cauchy-Schwarz inequality (and the fact that the functions e’™” 
are orthonormal on [0,27 |) to arrive at: 


a 
27 


Gm) fal it 


< m5 f irella(tlat 


en(2f" yore) (2 wore 
(ef | 
a 172 > 99 1/2 
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1/2 


Because of the special nature of the representation in our previous Lemma, it’s not so 
clear that this proof will yield the infinite sum version (unless we assume that (a,,) and 
(b,) are nonnegative). Moreover, we would be hard pressed to prove strict inequality by 
this method alone. Nevertheless, Toeplitz’s proof is not only simple, but has the benefit 
of amply demonstrating the interplay between “discrete” and “continuous” inner product 


spaces. 


Our final result in this section is Schur’s generalization of Hilbert’s inequality from 
1911. Surprisingly, the proof we will give is short, elementary, and does not appeal in any 


way to Hilbert’s inequality. 


Lemma. Let cy be a complex number, let A, be an integer, and let 0 < a< 1. Then 


27 
i 


Proof. We begin with the estimate 


20 n : 
fe : cpere® 
0 | k=1 


n 


: cpere® 


k=1 


dx > 2sin7a 


n 
1 


Ck 
An — 
pai 


dx 


IV 


27 7 
| Pe dr 
0 


k=1 


n 27 : 
= ay el Ar—A)® dy 
k=1 0 
n 2m(Ap—a@) 
alee ea 
pa ee 


But the modulus of the last integral depends only on a: 


20 (Ap—a@) : ; 20 (An—@) 
i er a. = ~ie™| 
0 


0 
= —i| *Ca-a) _ 1 
—_ —i| “Pe - 1 


— —4 ewe bere ot | 


= —2e sin7a. 


= 2sin7a@ and the result follows. 


2m (Ap—a@) 
Thus, | edu 
0 
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Theorem. Let a,, and b,, be complex numbers and let 0 < a < 1. Then 


Eyam |< ate(Se) (Ex) 


m=1n=1 


Proof. This is a simple matter of applying the Cauchy-Schwarz inequality and appealing 


to our previous Lemma. First, we essentially repeat a calculation from Toeplitz’s proof 
e - ‘ni 1. ong 173 

Lee Ss" be" |-dae <= ( Ss" «3 (> “) : (6) 
=1 n=1 


m=1 
On the other hand, from our previous Lemma, 


N ; N ; alt 20 
S° annem S° bye _? 3 3 Am Dn eilmtn)ax dr 
m=1 n=1 m=l1n=1 
(7) 
> -2 
sin Ta > yaa a a Aaa 


(by taking cy = Gmbp, and A, = m+n in our previous Lemma). Combining (6) and (7) 


completes the proof. 
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Hardy’s Inequality 


In his search for a new proof of Hilbert’s double series theorem, Hardy discovered the 
following inequalities (in 1920, but without the best constant in Theorem 1; this was later 


rectified by Landau in 1926). 


Theorem 1. Let 1 < p < ~, let (a,) be nonnegative and p-th power summable, and let 


A, =a, +a2+-::+a,. Then 
oo A p p p oc 
et, a P 
ye) < Ger) 2 a 
n=1 n=1 
unless (a,,) is identically zero. The constant is best possible. 


The integral analogue of Theorem 1 is given as 


Theorem 2. Let 1 < p < ~, let f : [0,00) — [0,00) be p-th power integrable, and let 
F(a) = fy f(t) dt. Then 


fed) aegis 0 


unless f is identically zero. The constant is best possible. 


Our proof of Theorem 1 is due to Elliot (1926). Our proof of Theorem 2 is (essentially) 


Hardy’s original proof. 


It may help to outline the heuristics that led Hardy to Theorem 1. Hardy’s approach 
to the double series theorem was to treat the sums above and below the diagonal as 
essentially identical; that is, he reasoned that it would suffice to examine 

SE as ee a Dn x Ay 

2 deman <n hn _ 
Now the conclusion of Hilbert’s theorem is, essentially, that the sum on the left converges 
whenever >>, a?, and >, b? converge (this by way of an application of Hélder’s inequality). 


But, Hardy reasoned, perhaps the convergence of )°,, a?, already implies the convergence 
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of S°,,(An/n)”, in which case an application of Hélder’s inequality to the right-hand side 


of (3) would lead to a more direct proof of Hilbert’s theorem; whence Theorem 1. 


Proof of Theorem 1. The proof begins with an observation that will help us establish 
strict inequality: We may suppose that a; > 0. Indeed, if the theorem has been proved in 
that case, then it will also hold for a sequence (b,,) with b; = 0 for, in this case, setting 


Gn = bn+1, we would have 


(+ BY + - Gy + (B52) + 
25 ( = 


LP. ay +a2\" 
< (2 
<(#)'+(45) 4 
pe p 
< p_ bP 
at Pe 
We now set x, = An / n and estimate: 
= . P 7 cP ta, = xP - aa {natn —(n—1)tn_1} xP 
zs (.- bi ) Bg UE et 
p—1 -1 
-—1 -—1 1 
< (1-2 Jers we mie Jep+iar sh (4) 
pel pol Pp Pp 
ves (n—1)x?_, -—nax?P 
p-1 n—-1 nj? 
where we’ve used Young’s inequality in (4). Next we sum over n = 1,..., N, noting that 


the terms on the right, above, will telescope, and conclude that 


N N 
Pp af Nz? 
oP — —— Gee linn = = < 0. 


p-1 
That is, 
N j N 
Pp p-1 
So oF pail af Oy: 
n=1 n=1 
Applying Holder’s inequality then yields 


N F N 1—1/p N 1/p 
de eer, (sot) (do) ; (5) 


n=1 


Collecting terms and raising both sides to the p-th power gives us the finite version of our 


result (but with a weak inequality): 


N 
dekh s 
n=1 


/\ 
——- 
Ss 

| |S 
e 
NS 
cS 
M= 
2 
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It follows that 


IA 


s ap (<2) : 3 aP (6) 

=e a a ? 
where both sides of the inequality are finite. Finally, to see that we actually have strict 
inequality, note that equality in (5) would force (x?) and (a?) to be proportional. But 
we took a, = x; > 0, so the constant of proportionality would have to be 1. That is, we 
would have a, = A,/n for all n, which can only occur if (a,) is constant. (And this is 
consistent with equality in (4), which would force (x,,) to be constant.) This is obviously 
inconsistent with the fact that (a,,) is p-th power summable; thus, (6) (and (1)) actually 


holds with strict inequality. 


To see that {p/(p—1)}” is the best constant, one approach is to consider the sequence 


a n-»~© and let e > 0. It’s a straightforward estimate, but we’ll skip the details. 


We next give (what is essentially) Hardy’s proof of Theorem 2. It is very similar in 


spirit (if not in detail) to the proof we’ve just given for Theorem 1. 


Proof of Theorem 2. As in the proof of Theorem 1, we will first establish a slightly weaker 


version of (2). In particular, we will first show that 


_ ey bad (2) [" para - 


for all T sufficiently large. To this end, note that if f is not identically zero, then ie fP? >o0 


and, hence, i f? > 0 for all T sufficiently large. (This expression is analogous to the sum 
Sey a? , and we will need to divide by it later in the proof, much as we did in the proof 


of Theorem 1.) 
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Next, note that if f is p-th power integrable, then f is integrable over [0,x]| for any 


rer=(f #0 it) cat frat 


from Holder’s inequality. In particular, we have x'~? F(x)? — 0 as x — O*, a fact that 


x > 0 and, moreover, 


will come in handy momentarily. 


We’re ready to attack (2’). We first integrate by parts, then apply Hélder’s inequality: 


ie (Fe) a — fp rey a(a'-”) 


x 1—p Jo 


= 2 F(a] + 2 | * g-PR(a)P"? f(0) da 


l-p 0 p-1Jo 
= OI hs «ob api pmiens 
-S0f (FSP) toa - Sorergye 
ae a Ce 
sf (FS) tou 8) 


aa ( [reas] . f (Fe)'ae) (9) 


where (in (7)) we’ve used the fact that 2'~? F(x)? — 0 as  — 0* and (in (8)) the fact that 


F > 0. (Note that if (F'(x)/x)? is integrable over | 0,00), as the theorem suggests, then we 
would expect to have 2(F(a)/x)? — 0 as x — oo. Thus, dropping the term T!~? F(T)? is 


unlikely to cause any problems.) 


We’ve been here before! J, (F /x)? occurs on both sides of our inequality, but to 


different powers. Divide, then raise both sides to the p-th power to arrive at (2’): 


i (fey dx < (2) f sera 


Because this inequality holds for all T’ sufficiently large, we’ve proved that 


CY es Gay Lamm 


which is very nearly the conclusion we were hoping for. All that’s missing is strict in- 


equality, which follows from a closer examination of our application of Holder’s inequality 
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in (9). Indeed, equality in (10) would force equality in (9), which in turn would force 
(F'(x)/x)? and f(x)? to be proportional. But this would force f to be a power of x, which 
is inconsistent with the assumption that f is p-th power integrable over [0,00). (If we 
assume for the moment that f is continuous, then the equation rf(x) = C i f would 
imply that f is differentiable and satisfies the differential equation xf /(x) = (C — 1)f(z), 


which has solution f(r) = Br°-?.) 


As with Theorem 1, the proof that the constant {p/(p—1)}” is best would follow from 


considering, for example, the function f(x) = ae, x > 1, and f(x) = 0, otherwise. 


Again, we will forego the details. 
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The Inequalities of Carleman, Knopp, Jensen, and Carleson 


We begin with a simple application of Hardy’s Theorem 1. 

Corollary. Let 1 < p < oo and let (a,,) be nonnegative. Then 
7 (attat eta) ( Pp yy 
3 2 as 
a n p-l1 


Note that the summand on the left can be written as My? 


script (m) is a reminder that we’re summing only n terms. Recall, too, that MS” (a) = 


(a), where the super- 


(a1a2+++a,)V/" < My (a); in fact, if we let p — co (with n still fixed), My? (a) decreases 


to Mi (a) while the constant on the right tends to e. Thus, we have another “name” 


inequality, due to Carleman in 1923 (but without a proof of strict inequality): 


Carleman’s Inequality. For (a,) nonnegative, 


CCl oe (1) 
n=1 n=1 


provided that (a,,) is not identically zero. The constant is best possible. 


An elementary yet elegant proof of Carleman’s inequality, due to Pélya in 1926, is 
outlined in the exercises. We'll give a second proof shortly. The proof that the constant e 


is best possible will be left for another day. 
There is an integral analogue of Carleman’s inequality. To find it, we first rewrite (1) 


as o> exp (4 2°, logax) < e 0°, an, which suggests: 


Knopp’s Inequality. If f :[0,00) — (0,00) is integrable, then 


[ott [ roe(rio) a} te < ef s(e)de 


unless f is identically zero (and we interpret exp(log(0)) = 0). 


Curiously, Knopp’s original inequality (from 1928) was stated with 1 as the lower 


limit of integration (in all instances), and that inequality is actually false! It fails for the 
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function f(x) = 1/x?, which can be verified by direct calculation. Nevertheless, Knopp’s 


original proof can be used to prove the corrected inequality. 


We'll give two proofs of Knopp’s inequality. First, though, we’ll take a short detour 
to prove the integral version of Jensen’s inequality, which is not only of interest in its own 


right, but which is also needed for Knopp’s proof. 


Jensen’s Inequality. Let f, w : I — R be integrable functions with w(x) > 0 and 


J, w(a) dx = 1. Let ®: J > R be a convex function defined on an interval J containing 


the range of f. Then 


® ( [teywe) ir) < f o(f@)) wa) ax, 


I 


Proof. Let uw = J, f() w(x) dx and let T(x) = ®(u) + m(a — 1) be a supporting line to 
the graph of ® at yw. (Recall that we have T(x) < ®(x) for all x and T(y) = ®(u).) Then 


O(j1) — O( f(x)) < m(w— f(x)) for all x € J. Multiplying by w and integrating over J then 


leads to: ®(y) — [, ®(f(x)) w(x) dx < m(u— p) = 0. 


If we set ®(x) = e*, we have the following integral analogue of the arithmetic- 


geometric mean inequality: 
Corollary. Let f, w: I — (0,co) with f, w(a) dx =1. Then 
exp (frou FC@)) wo i) S [f@we) dz. 
I 7 


A special case of Jensen’s inequality is worth stating separately: 


Corollary. Let f : [a,b] — R be continuous and let ® : J — R be a convex function 


defined on an interval J containing the range of f. Then 


b b 
(/ sat) < =i ®(f(t)) dt. 
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If ® is strictly convex, then equality can only occur if f is constant. 


Proof. We only need to prove the last assertion. As before, let = (b—a)~! fe f(t) dt 
and let T(x) = ®(j) + m(a — p) be a supporting line for ® at yp. If ® is strictly convex, 
note that ®(y) — ®(u) > m(y — pw) for y 4 uw. Now, if f is not constant, we have f(x) 4 pu 
for all z in some subinterval J of [a,b]. Thus ®(f(#)) — ®(u) > m(f(x) — w) on I and 
®(f(x)) — ®(w) > m(f(x) — pw) in any case. It follows that ie (®(F(@)) - ®(u)) dz > 0. 


Conversely, [, ‘ 


a 


(2(F(@)) - (u1)) dx = 0 would mean that ®(f(x)) = ®(y) for all x. But 


strictly convex functions can take on the same value at most twice; thus, f(x) assumes at 


most two values. As f is continuous, this forces f to be constant. 


As a second application of Jensen’s inequality, we’ll revisit an inequality we saw last 


iy dy dx a 1 
. n «ety m+tn+1 


To see how this follows from Jensen’s inequality, again note that the function h(x) = 


week; namely, 


1/(c+ 2) is strictly convex for x > —c. Thus, 


i dy 1 > 1 
n @t+y et fr ydy e+n+1/2° 


Similarly, 


i da x 1 _ 1 
m ttn+1/2~ n41/2+- fede mtnt+l 


One more easy calculation and we'll be ready to prove Knopp’s inequality. 


1 x 
Lemma. Let f : [0,00) — (0,00) be integrable and define g(x) = ={ yf (y) dy. Then 
0 


a F(e\ae = [9 dx. 


Proof. This is a simple matter of changing the order of integration. 


| alaae - [ af utara - [vt [aera - [Osada 
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Proof of Knopp’s inequality. Recall that an antiderivative for log x is xlogx — x (which 


vanishes at 0). Thus, 


I| 
| 
iq) 
ial 
so} 
oN 
Qlre 
os 
Oo 
pe 
Ned 
SY 
nse 
Q 
Ned 
SY 


< 2.2 [ exp(tostuf(y)) dy ) 
= = " ufly) dy, 
L” JO 


where, in (2), we’ve applied Jensen’s inequality. An appeal to our previous Lemma yields 


[ete freer) at} dee ef Fetes (3) 


Now equality in (3) would force equality in (2) for all x, which would in turn force log(yf (y)) 
to be constant. If f is not identically zero, this would mean that f(y) = c/y for some 


positive constant c. This is clearly inconsistent with the fact that f is integrable over 


(0,co). Thus, we have strict inequality in (3) unless f = 0. 


We next present an inequality due to Lennart Carleson (1954) which is at least partly 
related to Jensen’s inequality, through its style of proof, and which is at least partly related 
to Hardy’s inequality, in a way that will become apparent later. Moreover, it will yield 


both Carleman’s inequality and Knopp’s inequality as corollaries. 


Carleson’s Inequality. Let y : [|0,00) — R be a differentiable convex function with 


y(0) =0. Then, for any —1 < a < cw, we have 


[ x exp (-22)) dx < et f° st exp( —¢'(2)) de. (4) 


The constant e°*! is best possible. 


Proof. To begin, note that if p > 1, then the convexity of y assures us that 


y(py) — vly) 
py -y 


= ¥'(y), (5) 
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which we will write as: —y(py) < —y(y) — (p — 1)yy’(y). Now we estimate the left-hand 


side of (4) using (5) and Holder’s inequality with q = p/(p — 1), the conjugate to p. 


< pot is seas (= = r Due) it 


A _ ! 
= cae yo! exp ( —o . yo!4 exp (-2 w) dy 
0 PY 


A 1/q 
pe 1 y* on -e'n)a| 
0 
We now divide by ii P and raise both sides to the g-th power to arrive at 


[ven (-82) deo (pre) [resolv eae. 


To finish the proof, we first let A — oo and then let p — 17 or, equivalently, let ¢ — oo. 


Written in terms of g we have 


q \" ee 
prev — (1) = (1+) —-e as q-ow. 


As an application of Carleson’s inequality, let’s see how it can be used to derive 


Carleman’s inequality. 


First suppose that the sequence (a,,) is nonnegative and decreasing and define so = 0, 
Sn = >>;_, log(1/az,). Now let y be the piecewise linear continuous function with “nodes” 
or “corners” at (n,y(n)) = (n,s,). That is y(n) = s, for each n and yg is defined 
linearly on each interval (n — 1,n). Then, on the open interval (n — 1,n), we’ll have 
y' (x) = Sn — 8n-1 = log(1/a,). But if (a,) decreases, then vy’ will increase; thus, y will 
be convex. Also, because y(0) = 0, the function y(x)/x will increase—it’s the slope of the 


chord from (0,0) to (x, y(a)). Moreover, note that for n — 1 < x < n we have 


Ate) <0) _ 2 tog(t/ax). (6) 
k=1 
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Equality in (6) for all n and x would force (az) to be constant, which is plainly inconsistent 
with the summability of (a,). Thus, we must have strict inequality in (6), at least for 


certain n and x (which, by the continuity of y, will be good enough for our purposes). 


Finally, notice that 


(TI) “=e0 (222) < f° en (22) a 


where the inequality is strict, at least for certain values of n, per our discussion of (6). 


Summing this inequality and applying Carleson’s inequality leads to 


3 (iI) ” < [ (-82) haa c [ex (-o'@)) ax = itm 


where, again, strict inequality follows from our remarks concerning (6). That is, we have 


Carleman’s inequality for decreasing sequences (a,,). But that hardly matters: 


Claim: If (a;) is decreasing and if (b,) is any rearrangement of (a,), then bjb2---b, < 


a1d2°+:Q,, for all n. 


Indeed, if n is fixed, we may suppose that 61, b2,...,b, have been arranged in decreas- 
ing order, in which case it’s clear that b) < a,, bg < ag, and so on. Thus, b1b2---b, < 


a1a9°::Q,. It follows that 


(Because a, > 0, the series is unconditionally summable—that is, every rearrangement of 


(a,,) will sum to the same value.) 


Carleson’s inequality also leads to a version of Knopp’s inequality valid for decreasing 
functions. In this case, the function p(x) = — iiss log(f(t)) dt will have y’(x) = — log(f(x)), 


which is increasing; hence, y is convex. 


Corollary. If f : |0,00) — (0,00) is decreasing, then 


[ewe freer) at} di << ef f(a) dex. 


ral 


In light of our discussion of decreasing sequences and our knowledge of Knopp’s result, 
this Corollary raises a natural question: If f is decreasing, and if g is a “rearrangement” 


of f (whatever that might mean), does it follow that 


[ee f roetate) at} dx < [ete f roste0) at} dx? 
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1. 


Problem Set 6 


(a) Given positive weights w1,...,Wn, show that 


(E») = (in) es) 


(b) If we set wy, = w,(t) =t+k?/t for t > 0, show that the first factor on the right 
is bounded above by 7/2. 


a h 1/27 1/2 
(c) Show that: min { Soa a o} =2 (S-2) & i) : 
k=1 k=1 


k=1 


n 4 n n 
(d) Finally, conclude that (> «) <7? (> “) (> iat] . This is known as 
k=1 k=1 k=1 

Carlson’s inequality. The constant 7? is best possible. 


For any pair of nonnegative, real sequences (a,,,) and (b,,), show that 


ist TID? po. 1/2 
3 > er a (s:3] (8) ; 


m=1n=1 
[Hint: Mimic either of the proofs we gave for Hilbert’s inequality (with p = 2). In the 
case of the “homogeneous kernel” approach, take K(x, y) = [max{zx,y}]~! and note 
that fy” | Vemax{1, u}} ate =| 
Let (a,) be a nonnegative sequence and let (b,,) denote a rearrangement of (a,,). If (b,) 
is decreasing, show that )>~_, (An)? a (Ba)? where A, =a, t}ag+---+@n 
and B,, = b} +b2+---+b,. [Hint: Argue, for example, that the left-hand side increases 


if a pair of “out of order” terms a; < aj, where i < j, are swapped.] Conclude that it 


suffices to prove Hardy’s inequality (for series) in the case where (a,,) is decreasing. 


Let (a,) be amen and decreasing, and define ie =, fOr aT ed: 
Then )77°_, a? = f,° f(t)? dt. As usual, we define F(x) = fy f( 
(i) Show that F'(a)/zx decreases from A,/n to An4i/(n +1) overn<a2<n+1 and 
conclude that [5° (F(«)/x)?dx > >?) (An/n)?. 
(ii) Deduce that Hardy’s series inequality for (a,,) follows from his integral inequality 
for f. Thus, from 3, the integral inequality implies the series inequality for any 


nonnegative (not necessarily decreasing) sequence. 
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5. Here is an outline of an alternate proof of Hardy’s integral inequality due to Ingham. 
In what follows, all functions are nonnegative and 1 < p < oo. Parts (a)-(c) and (e) 
are independent, only part (d) depends on the prior steps. 

1/p 
ce Show that Ui oe £,y dr)" ay) < te eo x,y)? dy dx. [Hint: Write 
= fi gf x,y) dy. Use Holder on I = sie nds = if Jo F(@)?*9(a, y) dy de. 
For a stronger result, show that the inequality is strict unless Ae y) = h(x)k(y).] 
(b) If F(x) = fy f(t) dt, show that F(y)/y = nie (xy) 


(c) Show that bs (xy it saa < < (2 sala f(t yrat) 
(d) Show that ( J. E@)/y)?a ys re 4 ( ai 1)? dt) oo 


(e) This proves Hardy’s inequality over a 1]. It’s possible to deduce the inequality 
over [0,00) from (d). How? |[Hint: First try to deduce the result over [0,6] by 


means of a simple change of variable. ] 


6. Here is an outline of Pdlya’s proof of Carleman’s inequality; it’s legendary for its 


elegance and acuity. Let (a,) be a sequence of positive real numbers. 


(a) Given a positive sequence (c,,), justify the steps in the following calculation: 


fore) lore) 1/n 
1/n C1 Q1 * C1 AQ °° * Can 
(412 +++ An) => 
Cj{C, + ° Cn 


n=1 n=1 
CO 
< So (eiea: +: Ue ee =P Gite 
r= 1 
CO 
a 
—1/n 
= > amen ) 7 (e1en Cn) Te 
n=m 


(b) If em = (m+1)™/m™-1, show that (c1c2-++en)7/" = 1/(n +1) and conclude 
that 


CO 


a ieee 3d 
da plerea" en) ge Se cea es 


(c) Finally, deduce that 


SS asad ive 2 — = s Gin (1 + =)" <e > Gir 


n=1 m=1 m=1 m=1 


7. Show that the constant e°*! in Carleson’s inequality is best possible. [Hint: Let a > a 


and define v(x) = (a+ 1)x logz for x > 1, y(x) = 0 otherwise.] 
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Problem Set 6, Problem 1 


1. (a) Given positive weights w1,...,Wn, show that 


(Ee) < (Ga) Ee) 


(b) If we set wz, = w,(t) =t+k?/t for t > 0, show that the first factor on the right 
is bounded above by 7/2. 


7 ™ 1. og 1/2 
(c) Show that: min { Soadint eae o =2 (2) (> i) : 
k=1 k=1 


k=1 


nm 4 n n 
(d) Finally, conclude that 6 «] <7? > «) (> it) ; 
k=1 k=1 


k=1 
Solution. (a) This is a straight application of Cauchy-Schwarz: 
a= Paves (SL) (Stn) 
k=1 pa ee a k=1 
(b) For wy, = wz (t) =t+k?/t we have, by the integral test, 
1 “  t ae tea ™ 
— = x ds = ——— dr =; 
ais » TR i Ppa / ita 2 


. B 
(c) Note that d azw,(t) = ‘ie ay + — 2 k°az = At+ ; which is minimized when 
t = (B/A)*/? (and its minimum ae is iin DANA Be), 


2 n 
(d) Finally, we assemble the pieces. From (b) we have (Ss) = . S- agwe(t) for 


all t > 0. Thus, from (c), 


yi 2 A 1/2 in 1/2 
TT 
(ee) sFe(he) (Sea) 
k=1 k=1 k=1 


Squaring both sides completes the solution (except for the claim that the constant 7? is 


best possible). 
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Majorization and Schur Convexity 


In this section we will discuss an order relation that may help to explain—and consolidate— 
many of the elementary inequalities that we’ve encountered. As we’ll see, majorization, as 


the study and application of this order relation is called, is a fruitful undertaking. 


Before we say more, let’s make a few definitions and look at a few easy examples. To 
begin, given a finite length sequence of real numbers x = (21, 2%2,...,%n), we will denote 
the decreasing rearrangement of x by x* = (xj). That is, x} = max{a,:1<k<n}=2p,, 
vs = max{rz,:1<k<n, k ky} =2,,, and so on. It would be more accurate to say 
that x* is the non-increasing rearrangement of x, but that’s rather awkward. We'll stick 


with decreasing. 


Here’s an easy example of the decreasing rearrangement in action: 


n nm nm 
Proposition. For x, y € R” we have Sy: < So vii s Se: 
i=1 i=1 i=1 


Proof. We only need to prove the second inequality; the first then follows by considering 
x and —y. To begin, we may suppose that one of the sequences, let’s say x, is already in 
decreasing order but y is not. Thus, there are a pair of indices 7 < k such that y; < yp 


while x; > x,. Now consider: 
LiYk + LY; — (LjYji + LeYw) = (Lk — Tj)(Yj — YR) = O. 


It follows that the sum 5>;"_, x;y; can only increase if we exchange y; and y;,. After a finite 


number of such exchanges, y would be in decreasing order. 


(4 


We next define an order relation, written x < y, read “x is majorized by y” (or “y 
majorizes x”), and defined by the following string of inequalities (plus one equation): 


TiS 


tT +2Z Sy +H 


shy a SY 4 
G+ ag te +a, = Yt ya te + Yn 
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Example. (1,1,1,1) < (0,1,2,1) < (2,0,2,0) < (1,3,0,0) < (0,0,4,0). 


Judging by this single example, it would seem that “smaller” sequences are more evenly 
distributed, while “bigger” sequences are less evenly distributed. It’s also reasonably clear 
that “x” is transitive; thus, x < y and y ~ z imply that x < z. But, because the relation 
doesn’t depend on the order of the terms, we can’t expect to have, say, x < y and y < x 


* 


imply that « = y. It would, however, imply that «* = y*. Also, because we have one 
equation to satisfy, not all pairs of sequences will be comparable; for example, (1,1, 1) 
and (1,0,1) are incomparable. Finally, given any x and any permutations o and 7 of 


{1,2,...,n}, the rearrangements £5 = (Xo(1),---,Lo(n)) and x, satisfy x, < @ < 2;. 


In 1923, Schur studied majorization and asked which functions f : (R")t — R satisfy 
f(x) < f(y) whenever x < y. We now call such functions Schur convex. If, instead, 
f(x) => f(y) whenever x ~< y, we say that f is Schur concave. Again, more precise 
labels might be “Schur increasing” or “Schur monotone,” but so it goes. Because we have 


Lo < x < x, for any permutations o and 7, note that a Schur convex/concave function 


must also be symmetric; that is, f(x,) = f(x) for any permutation o. 


Fortunately, Schur gave us more than a label for such functions; he also gave us a test 


for them. 


Schur’s Criterion. Let f : (R")* — R be symmetric and continuously differentiable. 


Then f is Schur convex if and only if, for all 1 < j,k <n and all x € (R")T, 


Proof. Because f is symmetric, it follows that f is Schur convex on (R")* if and only if 


f is Schur convex on D = {@: 21 > 4% >-+- > 2n > OF. 


Given x € (R")T, let’s agree to write = (%1,...,£,) for the vector with entries 


p= +---+az,1<k <n. With this notation, the majorization x < y is equivalent to 


re 


the string of inequalities 7, < y,, 1 <k <n, together with the equation 7, = y,. In this 


way, majorization is almost the same as the coordinatewise comparison 7 < y. 


In particular, if we put D = {%: x € D} and define f : D > R by f(%) = f(z), then 


f is Schur convex on D if and only if f satisfies f(@) < f(g) whenever %, j € D satisfy 
Lp SUS ae an = Us LAE 1s; f must be nondecreasing in each of its first 


n — 1 coordinates; hence, 


O0< Of ®) forall 1T<j7<n. 
OX ; 
But f(%) = f (#1, %2 — #41, ...,¥n — n_1) 80, by the chain rule, 
Of(@) _ Of(a) Ax; , Of(x) Oaj41 _ Af(x) — Of(x) 
O< = = < + = = _ (2) 
OX; Ox; OX; OL 541 ODF44 Ox; Ox j41 


for all 1 < 7 <n. Summing (2) over 7,7 +1,...,4 —1 gives us 


= ofa) Ate) 


O0< or l<7<kin- and 26D. 
Ox; Ox}, 


Because f is symmetric, this is equivalent to 


O< (tg = Fe) (Fe — oe for xz € (R")*, 


which is just a way of saying that the the variables must be in the same order as the 


corresponding partial derivatives. 


By way of an example, let’s check that f(x) = x1%2...2p is Schur concave on (R")*. 


In this case, fr, = Liz; x;, and so 

(a; —tx)(fe; — fox) = —(aj— 2x)? [] a: < 0. 
It follows that f(x) < f(Z), where Z is the constant vector with entries My(x) = (a1+---+ 
Ln)/n, because Z < x. That is, we have another proof of the AGM: x12%2---%p < My(x)”. 


As another example, it’s not hard to check that f(x) = |||, is Schur convex for 
1 < p < o (and Schur concave for 0 < p < 1). Thus, using the same “test vector” as 


above, we have f(Z) < f(x) or: My(x) < M,(z). It also tells us that the vectors in our first 
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example are, in fact, listed in increasing order of p-norm. In this case, 41/? < (2+2P)1/? < 


(2? +2P)1/P < (14 3P)/P <4. 


Rather than catalogue more examples, let’s turn our attention to a better understand- 


ing of the relation x < y. This will lead us to another test for Schur convexity. 


What’s missing is some justification for the “convexity” in Schur convexity and, ulti- 
mately, its connection with majorization. Filling in the details will take some time, and 


will take us on something of a side trip, but the journey will have its rewards. 
A Bit of Matrix Theory 


We begin with the notion of a permutation matrix. Given a permutation o of {1,...,n}, 
the matrix associated with the transformation T(x) = x, is called a permutation matrix. 
Note that T, permutes the basis vectors according to a; i.e., T,(€;) = €o(;), and, hence, its 
matrix representation is the corresponding permutation of the columns of J, the identity 
matrix. Said in other words, a permutation matrix is a matrix with entries 0 and 1 in 
which every row and every column has precisely one nonzero entry. The simplest example 
of a permuation matrix is a transposition matrix, which corresponds to a simple exchange, 


or transposition, of two basis vectors: 


— swape.ande, — 


= 
a >) 
oor oO 


il 
0 
0 
0 


It won’t surprise you to learn that the set of permutation matrices on R” forms a group 
(under multiplication) that is isomorphic to S;,, the symmetric group on n letters (that is, 
the group of all permutations on {1,...,n}). In particular, note that every permuation 


matrix is invertible and its inverse is again a permutation matrix; indeed, (T,)~! = T,-1. 


We will also need the notion of a transfer matrix, defined by T = (1 — A)I + ATG, 


where 0 < A < 1 and where TJ, is a transposition matrix. Symbolically, if T, exchanges e; 


2 


and e;, we would have 


1 
1-vX 5 
me : 
r 1- A — J 
1 
ij T 
a d 


where all other diagonal entries are 1 and all other off-diagonal entries are 0. To understand 
the action of T’, suppose (for sake of argument) that we have x; > x;; then T subtracts a 
bit of the excess (namely, x; — x;) from x; and transfers it to x;; that is, 

(T(x))¢ = (1 — Ajay + Azz = 24 — Ale; — 2), 
(3) (T(a)); =Avi + (1 —-A)a; = 2; + A(z; — 2;), 

(T(x) )e=@r, kA i,j, 
where (7'(x)), denotes the k-th coordinate of T(x). Also notice that we have (T(x)); + 
(T(x)); =a; +;; thus, x and T(x) are comparable and, in fact, T(#) < x because T(z) 
is more evenly distributed than x (but it also follows from direct calculation, as we’ll see 
shortly). Finally, notice that every transposition matrix (including /) is also a transfer 


matrix (by taking A =0 or A= 1). 
More generally, suppose that we’re given a convex combination of several permutation 
matrices: 
DENT esate. KS, Apt ee =. 
Now the permutation matrices T,, each have a single 1 in any given row or column, so 


their convex combination D must satisfy 


a= Soe and pe ae ae 
i=l i=l =i j=l 
for all 2 and 7; that is, every row and every column of D sums to 1. Such a matrix is said to 


be doubly stochastic. In other words, an n x n matrix is doubly stochastic if d;,; > 0, for all 
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i, j, and every row and every column sums to 1; i.e., )>i_, d;,; = l and ae d;; = 1 for all 
i and j. (It follows that d;,; < 1 for alli, 7.) As it happens, every doubly stochastic matrix 
can be written as a convex combination of permutation matrices (Birkhoff’s theorem), so 


our initial supposition is actually the general case. 


Now might be a good time to say a couple of words about permutations and transposi- 
tions. From the proof of our first Proposition, it’s not hard to see that any permutation can 
be written as the product of finitely many transpositions. To see this, let’s first establish 


a useful claim: 


Claim. Given y € R”, we can write y* = 7T,7,-1---Tiy for some (finite) sequence of 


transposition matrices. 


We’ve sketched the proof already: If 71 is the index of the largest coordinate of y and 
if 7, is the transposition that exchanges 7, with 1, then 7; = T,, moves the largest entry 
of y into the first coordinate; that is T;y now has its largest entry in the first coordinate. 
If 22 ~ 1 is the index of the second largest coordinate in Ty and if T2 exchanges 72 and 
2, then T> = T,, moves the second largest entry of Ty into the second coordinate. That 
is, T7>T,y agrees with y* in its first two coordinates. And so on. After at most n — 1 such 


exchanges, we’ll arrive at y*. 


Now this argument has very little to do with decreasing rearrangements and everything 
to do with matching a specific permutation of the coordinates of y, so we’ve actually 


established: 


Claim. Given y € R” and ao € Sy, we can write ys = Ty Ty_-1---Tiy for some (finite) 
sequence of transposition matrices. In other words, T, = 7;,.7%—1---T\ or, in still other 
words, 0 = TrTp-1°°-T1. That is, every permutation can be written as the product of 


finitely many transpositions. 


We’re now ready to put some of the pieces together. 


S1 


Theorem. Let x, y € R” with x, y > 0. Then the following are equivalent 


(i) x is a convex combination (of finitely many) of the vectors {yz : 0 € Sy}. 
(ii) « = Dy for some doubly stochastic matrix D. 
(iii) a ~ y. 


(iv) c= T,T,---T,y for some (finite) sequence of transfer matrices T,,..., Ty. 


Proof. We’ve actually already shown that (i) implies (ii). Indeed, we only need to recall 


that yz = T,(y) for then: 
Od Aaya St Nk Yon => (A1To, os aati Arlo, )(Y) a Dy 
where D is doubly stochastic. 


To see that (ii) implies (iii), first note that either condition is unaffected by permuta- 
tions of x or y. As we’ve already seen, the condition x ~ y is equivalent to the condition 
Lo < y, for any permutations 0, Tt. Moreover, because the product of a doubly stochastic 
matrix and a permutation matrix is still doubly stochastic, it follows that the condition 
x = Dy is equivalent to the condition x, = D’y, for any permutations 0, T (and some 
doubly stochastic matrix D’, which may depend on o and 7, of course). Thus we may 


suppose that both x and y are in decreasing order. 


Now if x = Dy and if 1 < k <n is fixed, then we can write 
n k k n n 
Mi= Ss" Oi; => ae = ye DS degus = So egy where Cj = ean 
j=l i=l i=l j=1 j=l i=1 
In particular, if k = n, then c; = 1 for all j and, hence, So." 2; = 0j_1 yi. For k <n, 


the fact that D is doubly stochastic implies that 0 < c; < 1 for all j and 


k n 


pe = yo dig =k. 
j=l 


i=1 j=1 
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Thus, for 1 < k < n, we can write (watch closely!): 


k k n k 
dai dom = Soe — dou 
i=1 i=1 j=l i=1 
n k n 
= So cay — yi t+ oe k= 6 
j=l w=1 j=l 
n k 
= Soc; (yj — Yn) + Sou — yi) 
j=l i=1 
k n 
— So (ye —yi)(1—e)+ S- (Yi-yrjcr < 0 
i=l i=k+1 


In other words, we’ve shown that x ~ y. 


We next show that (iii) implies (iv). Let z, y € R” with x < y. Again, we may 


suppose that x and y are decreasing. We may also suppose that « 4 y, for otherwise we 
just take T; = J in (iv). Our proof proceeds by induction on the number of coordinates N 
in which x and y differ. The fact that we have 7; +---+ %, = yr +--+: + Yn will convince 
you that « and y cannot differ in only one coordinate, so let’s first suppose that « and y 
differ in precisely 2 coordinates; that is, for some 1 <j <k <n we have x; 4 yj, © # Yr 
and x; = y; fori #7, k. Now, because 41+ ---+4; <yit-:-+yj =@1t---+2j-14+ y;, 
we must have x; < y;. It follows that we then have x, > yx to compensate. Thus, 


Yj > Lj > LE > Ys. Finally, notice that x; + 2, = y; + yx; that is, y; — 7; = TE — Yq. 

With (3) as our guide, we can now write the transfer matrix that maps y into x: We 
set T = (1—A)I+ATj,n) where A = (yj — ©5)/(Yj — ye) = (&e — ye)/(Yy — Ye) and 
where 7(;,,) is the transposition matrix that exchanges e; and ex. As the following 2 x 2 
calculation shows, we'll then have x = Ty because: 

ae i | [| _ Led _ ba 
A LA oe YR + A(Yy — Ye) Li, 

(and because T fixes all other entries of y). This proves the case N = 2. 


Suppose now that (iii) implies (iv) holds whenever x ~ y and z and y differ in N —1 or 


fewer coordinates and that we’re given a pair of (decreasing) sequences with x < y where 
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x and y differ in N coordinates. Then, as in the case N = 2, the first place where x and 
y differ will have to satisfy 7; < y; and, for some k > 7 we'll have x, > y, to compensate. 


But, in fact, we must have some pair of coordinates 7 < k which satisfy: 
Lp Us eS es end) apy Tor 7p SE: 


(We might have k = 7 + 1, of course; what matters here is that x and y do not differ in 


any intermediate coordinates. ) 


Because we’ve assumed that x and y are in decreasing order, this again means that 
Yj > @; > L_ > yr. In particular, y; — y, > x; — x, and we'll transfer a bit of y; to yx, to 
arrive at x. However, because x and y might differ in several coordinates, we can no longer 
be assured that y; — 2; = ©, — yx; instead, we’ll transfer (a portion of) the smaller of 
the two amounts. Specifically, set T= (1 —A)I+AT(j,~) where T(;,,) is the transposition 
matrix that exchanges e; and e,, and where \ = min{y; — 7;,x~ — yx}/(y; — ye). If we 


let y = Ty, then, as above, T acts on only two coordinates of y; in this case, 


1-rA dX yi | _ | 7 —AQs — ye) | _ | yy —min{y; —2j, 2% — ye} | _ | 95 
| [a I= | ges 


A. A= AX) ve YR + AY; — Ye) ye +minfy;-—2;,t,—Yyr}| | Ge 


and y; = y; for i 4 j, k. It’s reasonably clear from the definition of y that we have 


207 2 Se ee ie 


Moreover, because we have either \ = (y; — 2;)/(yj — yx) or A = (@e — Yr)/(Ys — Yr), it 


follows that we have either y; = x; (in the first case) or Yj, = x, (in the second case). 
Claim. «<y=Ty <y. 


That y = Ty < y follows from our proof that (ii) implies (iii). What remains is to show 


that « < y. The key to this is the fact that 4; + 9, = y; + yr. Indeed, we have 
Si Se Pele i Shae sayy Se yy apie sa Sh a Lae gy Or, hr ae (4) 
see eS ee Uy SU es a A ek, (5) 
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That (4) holds is immediate; that (5) holds follows from the facts that x; < gy; and y; = y; 


forg: 54 <k 


In summary, we shown that there is a transfer matrix T such that x <~ y = Ty ~ y 
and such that x and y differ in at most N — 1 coordinates. By our induction hypoth- 
esis, there exist finitely many transfer matrices T),...,7;, such that « = T,---Ty(y) = 


T,---Ty(Ty) = (1 ---TeT)(y). Thus, we’ve shown that (iii) implies (iv). 


Finally, the proof that (iv) implies (i) follows by direct calculation and the observation 
that if T is a transfer matrix, then T(y) = (1 — A)y + Ay; is a convex combination of 


permutations of y. (In other words, if we write H for the set of all convex combinations 


of the vectors y,, then for any transfer matrix T we have T(H) Cc H.) 


Corollary. ® : (R”)t — R is Schur convex if and only if ®(T(x)) < ®(x) for every 


transfer matrix T if and only if ®(D(x)) < ®(x) for every doubly stochastic matrix D. 


Schur’s Majorization Inequality. If f : I — R is convex, then the function ® : I” > R 


defined by 


is Schur convex. That is, 


whenever x ~ y. 


Proof. The proof is virtually immediate from the preceding Corollary and (3). Indeed, 
if T is a transfer matrix that acts only on the j-th and k-th coordinates, and if we write 


£ = T(z), then, from (3), we have 7; =(1—A)a; Arey te = Avy + 1 —A)a,, and &; = a; 


fori #7, k. It’s then immediate from the convexity of f that ®(T(ax)) < ®(z). 


It’s clear that if ®(x) is Schur convex and if h(x) is increasing, then h(®(zx)) is again 


Schur convex. This explains why f(x) = ||2||, is Schur convex for 1 < p < oo. The 
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function (a) = )>"_, 2? is Schur convex (from the previous Corollary) and h(x) = x!/? 


is increasing. 


By way of another example, we present a classical inequality, due to Muirhead (1903), 


who introduced the transfer method. 


Muirhead’s Theorem. Let aj,...,@, be positive. For x € R” with x > 0 define 


B(x) = VF arty) aS ta) ola): 
TES 


Then ® is Schur convex. 
Proof. Let T = (1—A)I+AT;, be a transfer matrix, where 0 < A < 1 and where T = 7(;,;) 
denotes the transposition of i and j, i 4 j. Following (3), we can write 
ymem+d zs3=m-d, (T(r); =mt+yud, (T(r)); =m-— pd, 
where pp = 1 — 2X satisfies -1 <p <1. Then 
20(2) = [0(2) + O(2,)] 


| Fe ty Asta) etn + DL hea hee 


p3e eee ieee cen en 


cESn kAi,j 
as A m+d_m—d m—d m+d 
=F [II atte } [arteries + anietang 
cESy kAi,j 
Similarly, 
a x m+pud m—pud m—pd m+pd 
20(T(x)) = De I] a5 (k) ae Gj) TF) Fo) |. 
oES, \kFi,7 
Consequently, 


2&(x) — 20(T(x))= D> | [] azt, | 9), 


oESn \kAt, 9 
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where 
_ om om d -d -d od dl =i ay oo 
(0) = amare | (ada ard) + a5tyaeuy) — (atiyacty + astiabty ) | 
=a; aly (Gs Fe HOF re) | 
and where Yo = @¢(;)/@o(j) > 0. But for y > 0, the function f(s) = y*° + y~° is even and 
increasing on |0,00), thus O(a) > 0 because f(d) > f(ud). Thus, &(T(x)) < ®(z). 


Now if we insist that 5>;"_, 2; = 1, then the expression 


i x x Ln 
se ea » (1) 4e(2) °** Fon) 
TESn 


defines a mean (of a); it’s an average of geometric-like means over the n! members of S;,. 
With this slight change in notation, we get an interesting extension of the arithmetic- 
geometric mean inequality: 


Mo(a) < ®(a, 2) < Mi (a) 


because b = (1/n,1/n,...,1/n) ~ x ~ (1,0,...,0) =c and because 
1 n n 
@(a,b) = a » (€g(1)@o(2) ***@o(ny)/” = (a1az +++ an)! 
TEOn 


and 


®(a,c) = = S- Qo(1) = ~ Soa. 
: i=1 


TESn 
We complete this section with a proof of Garrett Birkhoff’s theorem on doubly stochas- 
tic matrices (1946), a pivotal result in the theory of majorization. Curiously, this is yet 
another example of a famous theorem of uncertain provenance. Some sources claim that 
it was first proved by K6nig in 1936 (in an article on graph theory). The theorem was 
discovered independently by John von Neumann in 1953 (in an article on game theory). 


Many contemporary sources refer to it as the Birkhoff-von Neumann theorem. 


Theorem. Every doubly stochastic matrix can be written as a convex combination of 


permutation matrices. 


Proof. We’ll prove the theorem by induction on the number of nonzero entries. It’s clearly 


true for doubly stochastic matrices with precisely n nonzero entries, as any such matrix 
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is necessarily a permutation matrix. So we suppose that the theorem holds for doubly 
stochastic matrices with fewer than k nonzero entries, where n + 1 < k < n?, and that 


we're given a doubly stochastic matrix D with precisely k nonzero entries. 


In particular, D has some entry d;,,;, with 0 < dj;,,;, < 1. But the sum of entries in 
the 21-th row is 1, so there must be some j2 4 ji such that 0 < dj, j, <1. But the sum of 
the entries in the j-th column is 1, so there is some index 72 £ 7; such that 0 < dj, j, < 1, 
and so on. This process can be iterated until some pair (7,7) is repeated. Thus we may 
suppose that we’ve chosen a circuit that begins and ends at (i1, 71) and, moreover, that 


we’ve chosen the shortest such circuit whose entries will then satisfy 


Odie ly Ord <i. bass m, 


Us Jst1 


where 71,...,im are distinct, j1,...,jm are distinct, and where jm4i = j1. (If the circuit 

produced three consecutive pairs in the same row or column, we could delete one of those 

pairs, arriving at a shorter circuit with the property we need. Consequently, a minimal 
eae ae a *s Caer : 

circuit will have an even number of “corners,” with no more than two corners in any one 


row or column.) 


We now define an auxiliary n x n matrix A by setting a;,.;, = 1, ai,j,,, = —1, for 
1<s<™m, and a;,; = 0 otherwise. Note that by our construction every row and every 


column of A sums to zero. Finally, set 


A= TOI: hoo and j6= min >); 


l<s<m l<s<m din jot 
Then each of the matrices D— A and D+ yA has nonnegative entries and each has row 
and column sums equal to 1. Thus, each is doubly stochastic and, moreover, each has 


fewer than k nonzero entries. By induction, each can be written as a convex combination 


of permutation matrices. It follows that the same will be true of D because 


m 
D = —(D-)A — (D A). 
rear ees | + pA) 
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MATH 6820 Problem Set 7 August 1, 2011 


1. For x, y, z > 0, show that 


2 on 6 a 6 ee ek ee 
x+y 32 + y+ 2z Qe+3yt2)/ ~— 2 yb 25° 


2. For x € R”, n > 2, statisticians use the sample variance 


where & = (41 +---+2,,)/n, to measure the dispersion of sample data x = (4,...,2n). 


Information theorists, on the other hand, use the entropy 
h(p) = — > px log pe 
k=1 


to measure the dispersion of probability distributions p = (pi,...,Pn), where pz > 0 


and pi + -:-+ pn = 1. Show that s(a) is Schur convex on R” while h(p) is Schur 


concave on (R”)?. 


3. Given 0 < 2, y, z <1 such that max{z,y,z} < (e+y+42z)/2, show that 


PD ei. 


4. Show that every doubly stochastic matrix has a positive diagonal. [Hint: If D is 


doubly stochastic, then D = A,;P, +--+ AxPr, where each P; is a permutation 


matrix, and where some A; > 0.] 


5. Let Ay >--- > A, be the eigenvalues of a Hermitian matrix A € M,,(C), arranged in 


decreasing order. Show that for any orthonormal sequence v1,...,U,% € C” we have 
k k k 
ye = A) Se 
j=l j=l j=l 


Conclude that 


An < inf{(Av,v): |lv|] =1} < sup{(Av,v): |lv]] =1} < Ai. 


[Hint: For « € R” and 1 <k <n, the function g(x) = oy x; is Schur convex.| 
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